巴西专利BR112019019287A2 advanced signaling of regions of interest in omnidirectional visual media

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
in various deployments, modifications and / or additions to isobmff may indicate that a file that has been formatted according to isobmff, or a format derived from isobmff, includes virtual reality content. the file may include a restricted regime information box, written in a track box in the file. the restricted regime information box may indicate a virtual reality regime for the track's contents. for example, a signaling mechanism can indicate a more visualized display window with virtual reality data.
公开号:BR112019019287A2
申请号:R112019019287
申请日:2018-03-23
公开日:2020-04-14
发明作者:Wang Yekui
申请人:Qualcomm Inc；
IPC主号:

专利说明:

ADVANCED SIGNALING OF REGIONS OF INTEREST IN OMNIDIRECTIONAL VISUAL MEDIA
PRIORITY CLAIM UNDER 35 USC §119 [0001] This patent application claims priority to Provisional Patent Application No. 62 / 475,714, entitled Advanced signalling of regions of interest in omnidirectional visual media, filed on March 23, 2017, and Non-Provisional Patent Application No. 15 / 927,799, filed on March 21, 2018, assigned to the depositor and hereby expressly incorporated by reference.
FUNDAMENTALS
Field [0002] This patent application refers to the storage and processing of virtual reality (VR) video content in one or more media file formats, such as an ISO-based media file format (ISOBMFF) and / or file formats derived from ISOBMFF.
Fundamentals [0003] Video encoding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / Visual IEC MPEG-4, ITU-T H.264 or ISO / IEC MPEG-4 AVC, including its scalable video encoding extension known as Scalable Video Encoding (SVC) and its Multivista Video Encoding (MVC) extensions and High Efficiency Video Encoding (HEVC), also known as ITU-T H.265 and ISO / IEC 23008-2, including its scalable encoding extension (ie, scalable high efficiency video encoding, SHVC) and the multiview extension (that is, the
Petition 870190092767, of 9/17/2019, p. 5/141
2/109 high efficiency multivista video coding, MVHEVC).
SUMMARY [0004] In some embodiments, the techniques are described to indicate in a file that the file includes virtual reality content, so that video player devices can render correctly and / or ignore virtual reality content.
[0005] According to an example, a method for decoding and displaying virtual reality data is discussed. The method may include receiving a file containing virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; extract the virtual reality data from the file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with virtual reality data, in which the information associated with virtual reality data is stored inside a track box; extract a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more display window visualized associated with virtual reality data, and decode and render the virtual reality data for display to a user. Information in the most viewed display window associated with virtual reality data can
Petition 870190092767, of 9/17/2019, p. 6/141
3/109 understand identification data of a shape type and identification data of a spherical region display window specified by four large circles. The information in the most viewed display window associated with virtual reality data can comprise identification data of a shape type and identification data from a spherical rectangular display window specified by two yaw circles and two tilt circles. The most viewed display window can be associated with a time of presentation of the virtual reality data to the user. The most viewed display window associated with virtual reality data can be selected from the group consisting of: a display window fully covered by a set of the most requested image regions based on statistical indications of visualization of virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a user control absent from the standard display window over a viewing orientation of virtual reality data, a display window defined by the director of reality data virtual, and a display window defined by the producer of the virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. Virtual reality data can be rendered and displayed using the information in the most viewed display window associated with virtual reality data. The file format can be based on a media file format
Petition 870190092767, of 9/17/2019, p. 7/141
4/109 based on the International Organization for Standardization (ISO).
[0006] According to an example, a device for decoding and displaying virtual reality data is discussed. The device may include a receiver configured to receive a file containing virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; and a processor configured to extract the virtual reality data from the file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with virtual reality data, where the information associated with virtual reality data is stored within a track box; extract a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more display window visualized associated with virtual reality data, and decode and render virtual reality data for display to a user. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical display window specified by four large circles. The information in the most viewed display window associated with the
Petition 870190092767, of 9/17/2019, p. 8/141
5/109 virtual reality comprises identification data of a shape type and identification data of a spherical rectangular display window specified by two yaw circles and two slope circles. The most viewed display window can be associated with a time of presentation of the virtual reality data to the user. The most viewed display window associated with virtual reality data can be selected from the group consisting of: a display window fully covered by a set of the most requested image regions based on statistical indications of visualization of virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a user control absent from the standard display window over a viewing orientation of virtual reality data, a display window defined by the director of reality data virtual, and a display window defined by the producer of the virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. Virtual reality data can be rendered and displayed using the information in the most viewed display window associated with virtual reality data. The file format can be based on a media file format based on the International Organization for Standardization (ISO).
[0007] According to another example, a method of storing virtual reality data is discussed. The method may include obtaining virtual reality data, in
Petition 870190092767, of 9/17/2019, p. 9/141
6/109 that virtual reality data represents a 360 degree view of a virtual environment; store the virtual reality data in a file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with the virtual reality data, in which the information associated with the virtual reality data is stored inside a track box; and store a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a display window most visualized associated with virtual reality data. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical display window specified by four large circles. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical rectangular display window specified by two yaw circles and two tilt circles. The most viewed display window can be associated with a time of presentation of virtual reality data to a user. The most viewed display window associated with the virtual reality data can be selected from the group consisting of: a
Petition 870190092767, of 9/17/2019, p. 10/141
7/109 viewport fully covered by a set of the most requested image regions based on statistical indications of viewing virtual reality data at presentation time, a recommended viewport for displaying virtual reality data, a user absent from the standard viewport on a virtual reality data viewing orientation, a viewport defined by the director of virtual reality data, and a viewport defined by the producer of virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. The file format can be based on a media file format based on the International Organization for Standardization (ISO).
[0008] According to another example, a device for storing virtual reality data is discussed. The apparatus may include a receiver configured to obtain virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; and a processor configured to store virtual reality data in a file, where virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the data virtual reality and specifies the position within the information file associated with virtual reality data, where the information associated with virtual reality data is stored within a
Petition 870190092767, of 9/17/2019, p. 11/141
8/109 track box; and store a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a display window most visualized associated with virtual reality data. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical display window specified by four large circles. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical rectangular display window specified by two yaw circles and two tilt circles. The most viewed display window can be associated with a time of presentation of virtual reality data to a user. The most viewed display window can be associated with the virtual reality data can be selected from the group consisting of: a viewport fully covered by a set of the most requested image regions based on statistical indications of viewing the data of virtual reality at presentation time, a recommended display window for displaying virtual reality data, a user control absent from the standard display window over a virtual reality data viewing orientation, a display window defined by the data director virtual reality, and a viewport
Petition 870190092767, of 9/17/2019, p. 12/141
9/109 defined by the producer of the virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. The file format can be based on a media file format based on the International Organization for Standardization (ISO).
[0009] According to another example, a non-transitory computer reading medium containing instructions for getting a computer to perform a method is discussed. The method may include receiving a file containing virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; extract the virtual reality data from the file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with virtual reality data, in which the information associated with virtual reality data is stored inside a track box; extract a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more display window visualized associated with virtual reality data, and decode and render virtual reality data for display to a user. The information in the most viewed display window associated with virtual reality data can comprise identification data of a type of shape and
Petition 870190092767, of 9/17/2019, p. 13/141
10/109 identification of a spherical display window specified by four large circles. The information in the most viewed display window associated with virtual reality data can comprise identification data of a shape type and identification data from a spherical rectangular display window specified by two yaw circles and two tilt circles. The most viewed display window can be associated with a time of presentation of the virtual reality data to the user. The most viewed display window associated with virtual reality data can be selected from the group consisting of: a display window fully covered by a set of the most requested image regions based on statistical indications of visualization of virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a user control absent from the standard display window over a viewing orientation of virtual reality data, a display window defined by the director of reality data virtual, and a display window defined by the producer of the virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. Virtual reality data can be rendered and displayed using the information in the most viewed display window associated with virtual reality data. The file format can be based on a media file format based on the International Organization for Standardization (ISO).
Petition 870190092767, of 9/17/2019, p. 14/141
11/109 [0010] According to an example, a device for decoding and displaying virtual reality data is discussed. The apparatus may include a receiving medium configured to receive a file containing virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; and a processor configured to extract the virtual reality data from the file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with virtual reality data, where the information associated with virtual reality data is stored within a track box; extract a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more display window visualized associated with virtual reality data, and decode and render virtual reality data for display to a user. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical display window specified by four large circles. The information in the most viewed display window associated with the virtual reality data comprises identification data of a type of shape and identification data of a
Petition 870190092767, of 9/17/2019, p. 15/141
12/109 spherical rectangular display specified by two yaw circles and two slope circles. The most viewed display window can be associated with a time of presentation of the virtual reality data to the user. The most viewed viewing window associated with virtual reality data can be selected from the group consisting of: a viewing window fully covered by a set of the most requested image regions based on statistical indications of viewing virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a user control absent from the standard display window over a viewing orientation of virtual reality data, a display window defined by the director of reality data virtual, and a display window defined by the producer of the virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. Virtual reality data can be rendered and displayed using the information in the most viewed display window associated with virtual reality data. The file format can be based on a media file format based on the International Organization for Standardization (ISO).
[0011] According to another example, a non-transitory computer reading medium containing instructions for getting a computer to perform a method is discussed. The method may include obtaining virtual reality data, in which the virtual reality data represents a vision in
Petition 870190092767, of 9/17/2019, p. 16/141
10/13
360 degrees of a virtual environment; store the virtual reality data in a file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with the virtual reality data, in which the information associated with the virtual reality data is stored inside a track box; and store a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a display window most visualized associated with virtual reality data. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical display window specified by four large circles. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical rectangular display window specified by two yaw circles and two tilt circles. The most viewed display window can be associated with a time of presentation of virtual reality data to a user. The most viewed display window associated with virtual reality data can be selected from the group consisting of: a display window fully covered by a set of
Petition 870190092767, of 9/17/2019, p. 17/141
14/109 most requested image regions based on statistical indications of viewing virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a user control absent from the standard display window over a guidance for viewing virtual reality data, a viewing window defined by the director of virtual reality data, and a viewing window defined by the producer of virtual reality data. Extracting virtual reality data from files can comprise extracting virtual reality data from one or more media tracks in the file. The file format can be based on a media file format based on the International Organization for Standardization (ISO). According to another example, a device for storing virtual reality data is discussed. The apparatus may include a receiving medium configured to obtain virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; and a processor configured to store virtual reality data in a file, where virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the data virtual reality and specifies the position within the information file associated with the virtual reality data, in which the information associated with the virtual reality data is stored inside a track box; and store a sample entry from the track box, where the sample entry is associated with one or more
Petition 870190092767, of 9/17/2019, p. 18/141
15/109 samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more visualized display window associated with virtual reality data. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical display window specified by four large circles. The information in the most viewed display window associated with virtual reality data comprises identification data of a shape type and identification data from a spherical rectangular display window specified by two yaw circles and two tilt circles. The most viewed display window can be associated with a time of presentation of virtual reality data to a user. The most viewed display window can be associated with virtual reality data can be selected from the group consisting of: a fully visible viewport
covered by one set of regions more image requested with based on indications Statistics in preview From data from virtual reality in time in presentation, One window in exhibition recommended for display the data of reality virtual, a control in missing user from the window in exhibition standard about an
guidance for viewing virtual reality data, a viewing window defined by the director of virtual reality data, and a viewing window defined by the producer of virtual reality data. Extracting virtual reality data from files can understand extracting
Petition 870190092767, of 9/17/2019, p. 19/141
16/109 the virtual reality data from one or more tracks of
media archive. 0 format of < file can be based on a Format media file based in the Organization International Standardization (ISO). [0012] This summary there is not the intention to
identify key or essential features of the claimed matter, nor should it be used in isolation to determine the scope of the claimed matter. The matter should be understood by reference to the appropriate parts of the entire specification of this patent, any or all drawings and all claims.
[0013] The previous content, together with other resources and embodiments, will become more evident by reference to the following specification, claims and attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS [0014] Illustrative embodiments of the present invention are described in detail below, with reference to the following figures:
[0015] Figure 1 is a block diagram illustrating an example of a system including an encoding device and a decoding device.
[0016] Figure 2 illustrates an example of a media file based on ISO that contains data and metadata for a video presentation, formatted according to the ISOBMFF.
[0017] Figure 3A and figure 3B illustrate examples where a top-level box in an ISO-based media file is used to indicate that the file includes virtual reality content.
Petition 870190092767, of 9/17/2019, p. 20/141
17/109 [0018] Figure 4 illustrates an example where an indication of the film level is used in a media file based on ISO 400 to indicate that the file includes virtual reality content.
[0019] Figure 5 illustrates an example where a film level indicator is used in a media file based on ISO 400 to indicate that the file includes virtual reality content.
[0020] Figure 6 illustrates an example of an ISO-based media file, where a manipulator box is used to signal that the content of a track includes virtual reality video.
[0021] Figure 7 illustrates an example of an ISO-based media file, in which a new handler box has been defined to indicate that the track includes virtual reality content.
[0022] Figure 8 illustrates an example of a media box that can be included in an ISO-based media file.
[0023] Figure 9 illustrates an example of a process for generating a file containing virtual reality content.
[0024] Figure 10 illustrates an example of a process for extracting virtual reality content from a file.
[0025] Figure 11 illustrates an example of a process for decoding and rendering a virtual reality environment.
Figure 12 is a block diagram illustrating an exemplary coding device that can
Petition 870190092767, of 9/17/2019, p. 21/141
18/109 implement one or more of the techniques described in this invention.
[0027] Figure 13 is a block diagram illustrating an exemplary decoding device.
DETAILED DESCRIPTION [0028] A mechanism for signaling a view window that is most viewed in virtual reality content. The most viewed viewport can be a viewport fully covered by a set of most requested image regions. The most requested image regions may be statistically more likely to be requested or rendered to a user in a presentation. For example, these regions may include regions of high user interest within the virtual reality content at the time of presentation. In another deployment, the most requested viewport may be a viewport that should be displayed when the user has no control or has given up control of a view's orientation. The metadata of the most requested image regions and the corresponding metadata of the most requested image regions can be associated with each other and help a video display device to understand which region on the spherical surface of the virtual reality content has been most requested and viewed.
[0029] A new type of sample entry can be defined, for example, indicated by 4CC mvvp. This indicates that the track is a timed metadata track that contains information in a more viewed display window. Various types of display windows can be
Petition 870190092767, of 9/17/2019, p. 22/141
19/109 indicated, such as a spherical region specified by four large circles, or a spherical region specified by two yaw circles and two inclination circles.
[0030] Virtual reality (VR) describes a three-dimensional computer-generated environment with which it is possible to interact in an apparently real or physical way. In general, a user who experiences a virtual reality environment uses electronic equipment, such as a video helmet (HMD) and optionally also clothing (for example, gloves equipped with sensors), to interact with the virtual environment. As the user moves in the real world, the images rendered in the virtual environment also change, giving the user the feeling that he is moving within the virtual environment. In some cases, the virtual environment includes sound that correlates with the user's movements, giving the user the impression that the sounds originate from a certain direction or source. Virtual reality video can be captured and played in very high quality, potentially providing a true immersive virtual reality experience. Virtual reality applications include games, training, education, sports videos and online shopping, among others.
[0031] A virtual reality system usually includes a video capture device and a video display device, and possibly also other intermediate devices, such as servers, data storage and data transmission equipment. A video capture device can include a set of cameras, that is, a set of several cameras, each
Petition 870190092767, of 9/17/2019, p. 23/141
20/109 oriented in a different direction and capturing a different point of view. Only six cameras can be used to capture a complete 360-degree view centered at the camera suite location. Some video capture devices may use fewer cameras, such as video capture devices that mainly capture side-by-side views. A video usually includes frames, where a frame is an electronically encoded static image of a scene. The cameras capture a certain number of frames per second, which is commonly referred to as the camera's frame rate.
[0032] In order to offer a 360 degree seamless view, the video captured by each of the cameras in the camera set generally undergoes image stitching. Image stitching in the case of 360-degree video generation involves combining or merging video frames from adjacent cameras in the area where the video frames overlap or connect in some other way. The result would be a roughly spherical picture, but similar to a Mercator projection, the merged data is usually represented in planar mode. For example, pixels in a merged video frame can be mapped on the planes of a cube, or some other three-dimensional, planar shape (for example, a pyramid, an octahedron, a decahedron, etc.). Video capture and video display devices can operate on a raster principle, meaning that a video frame is treated as a grid of pixels, so square or rectangular planes are normally used to represent a spherical environment.
Petition 870190092767, of 9/17/2019, p. 24/141
21/109 [0033] Virtual reality video frames, mapped to a planar representation, can be encoded and / or compressed for storage and / or transmission. Encoding and / or compression can be performed using a video codec (for example, an H.265 / HEVC compatible codec, an H.264 / AVC compatible codec, or another suitable codec) and results in a compressed video bit stream or group of bit streams. The encoding of video data using a video codec is described in more detail below.
[0034] The bit stream (s) of encoded video (s) can be stored and / or encapsulated (s) in a media format or file format. The stream (s) of stored bits (s) can be transmitted, for example, over a network, to a receiving device that can decode and render the video for display. Such a receiving device may be referred to herein as a video display device. For example, a virtual reality system can generate encapsulated files from encoded video data (for example, using an International Standardization Organization (ISO) media file format and / or derived file formats). For example, the video codec can encode video data and an encapsulation engine can generate media files by encapsulating video data in one or more media files in ISO format. Alternatively or in addition, the bit stream (s) can be delivered directly from a storage medium to a receiving device.
Petition 870190092767, of 9/17/2019, p. 25/141
22/109 [0035] A receiving device can also implement a codec to decode and / or decompress a
bit stream can support of videothe format encoded. 0 file or device receptorwas used media what to condense the flow of bits of video inside of one file (or files), extract the Dice in video (and possibly also from audio), to generate the data from
encoded video. For example, the receiving device can analyze the media files with the encapsulated video data to generate the encoded video data, and the codec on the receiving device can decode the encoded video data.
[0036] The receiving device can then send the decoded video signal to a rendering device (for example, a video display device). Rendering devices include, for example, video helmets, virtual reality televisions and other 180 or 360 degree display devices. Generally, a video helmet is capable of tracking a user's head movement and / or a user's eye movement. The video helmet can use the tracking information to render the portion of a 360-degree video that corresponds to the direction in which the user is looking, so that the user experiences the virtual environment in the same way that he would experience the real world. A rendering device can render a video at the same frame rate at which the video was captured, or at a different frame rate.
[0037] File format standards can define the format for condensing and decondensing data from
Petition 870190092767, of 9/17/2019, p. 26/141
23/109 video (and possibly also audio) in one or more files. File format standards include the International Standardization Organization (ISO) media file format (ISOBMFF, defined in ISO / IEC 14496-12) and other file formats derived from ISOBMFF, including the MPEG-4 file format the Moving Image Experts Group (MPEG) (defined in ISO / IEC 14496-15), the file format of the Third Generation Partnership Project (3GPP) (defined in 3GPP TS 26.244) and the file format of the Advanced Video Encoding (AVC) and the High Efficiency Video Encoding (HEVC) file format (defined in ISO / IEC 1449615). The draft texts of the new recent editions for ISO / IEC 14496-12 and 14496-15 are available at http: //phenix.íntevry.fr/mpeg/doc_end_user/documents/lll_Geneva/wgll/W15177v6-wl5177.zip and http: / /phenix.intevry.fr/mpeg/doc_end_user/documents/112_Warsaw/wgll/wl54 7 9v2-wl5479.zip, respectively.
[0038] ISOBMFF is used as the basis for many codec encapsulation formats (for example, the AVC file format or any other suitable codec encapsulation format), as well as for many multimedia container formats (for example, the format MPEG-4 file format, the 3GPP (3GP) file format, the DVB file format, or any other suitable multimedia container format). ISOBMFF file formats can be used for continuous media, which is also referred to as media streaming.
[0039] In addition to continuous media (for example, audio
Petition 870190092767, of 9/17/2019, p. 27/141
24/109 and video), static media (eg images) and metadata can be stored in an ISOBMFF compliant file. Files structured according to the ISOBMFF can be used for many purposes, including local playback of media files, progressive download of a remote file, as segments for Dynamic Adaptive Streaming over HTTP (DASH), as containers for content to be transmitted (in which case the containers include instructions for packaging), for recording received media streams in real time, or for other uses.
[0040] ISOBMFF and its derived file formats (for example, the AVC file format or other derived file formats) are widely used for the storage and encapsulation of media content (for example, including video, audio and text in many multimedia applications. ISOBMFF and the file formats derived from ISOBMFF, however, do not include specifications for storing virtual reality (VR) video. For example, if a virtual reality video is stored in a file based on ISOBMFF or a derived file format, a playback device can treat (for example, try to process) the virtual reality video as a conventional planar video ( for example, the playback device may treat virtual reality video as not including virtual reality content). The playback device may therefore not apply the necessary projection of the virtual reality video during rendering, resulting in distorted and potentially impossible video
Petition 870190092767, of 9/17/2019, p. 28/141
10/25 preview when displayed.
[0041] In several deployments, modifications and / or additions to the ISOBMFF may indicate that a file that has been formatted according to the ISOBMFF, or a format derived from the ISOBMFF, includes virtual reality content. For example, in some deployments, a file may include an indication of the file level, which signals (for example, indicates) that the file's content is formatted for use in use cases or virtual reality deployments. As another example, in some deployments, a file may include an indication of the film's level, which signals (for example, indicates) that the presentation of the film in the file includes virtual reality content. As another example, in some deployments, a file may include an indication of the track level, which signals (for example, indicates) that a track includes virtual reality content. In several deployments, the parameter related to virtual reality content can also be signaled at the file, film and / or track level.
[0042] In these and other deployments, playback devices can recognize when a file includes virtual reality content. In some cases, playback devices that are not able to display virtual reality content may ignore and / or skip the virtual reality media.
[0043] Some aspects and embodiments of this invention are discussed. Some of these aspects and embodiments can be applied independently and some of them can be applied together, as would be evident to those with knowledge of the subject. At
Petition 870190092767, of 9/17/2019, p. 29/141
26/109 description below, for the purpose of explanation, specific details are established in order to provide a comprehensive understanding of the embodiments of the invention. However, it will be evident that various embodiments can be practiced without these specific details. Figures and description are not restrictive.
[0044] The description below provides only exemplary embodiments, and is not intended to limit the scope, applicability or configuration of the invention. Preferably, the following description of the exemplary embodiments will provide those skilled in the art with a description that allows for the implementation of an exemplary embodiment. It should be understood that several modifications can be made to the function and arrangement of the elements, without departing from the spirit and the scope of the invention, as set out in the appended claims.
[0045] Specific details are indicated in the description below to provide a comprehensive understanding of the embodiments. However, it will be apparent to the person skilled in the art that the embodiments can be practiced without these specific details. For example, circuits, systems, networks, processes and other components can be shown as components in the form of a block diagram, in order not to obscure the embodiments with unnecessary details. In other cases, well-known circuits, processes, algorithms, structures and techniques can be shown without unnecessary details, in order to avoid obfuscation of the embodiments.
[0046] In addition, it is noted that individual embodiments can be described as a process
Petition 870190092767, of 9/17/2019, p. 30/141
27/109 which is illustrated as a flow chart, a flow diagram, a data flow diagram, a structure diagram or a block diagram. Although a flowchart can describe operations as a sequential process, many operations can be performed in parallel or simultaneously. In addition, the order of operations can be rearranged. A process ends when its operations are complete, but there may be additional steps not included in a figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
[0047] The term computer reading medium includes, but is not limited to, portable or non-portable storage devices, optical storage devices and various other media capable of storing, containing or transporting instruction (s) and / or data. A computer reading medium may include a non-transitory medium, in which data can be stored and which does not include carrier waves and / or transient electronic signals that propagate wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a disc or magnetic tape, optical storage media such as a compact disc (CD) or digital versatile disc (DVD), flash memory, memory or memory devices. A computer reading medium may have code and / or machine execution instructions stored in it, which may represent a procedure, function, subprogram, program, program.
Petition 870190092767, of 9/17/2019, p. 31/141
28/109 routine, a subroutine, a module, a software package, a class or any combination of instructions, data structures or program confirmations. A code segment can be coupled to another code segment or a hardware circuit passing and / or receiving information, data, arguments, parameters or memory contents. Information, arguments, parameters, data, etc. can be passed, sent or transmitted by any means
suitable, including memory sharing, send in posts, token sending, transmission by network or similar. [0048] In addition, embodiments can to be
implemented by hardware, software, firmware, middleware, microcode, hardware description languages or any combination of these. When implemented in software, firmware, middleware or microcodes, the program code or code segments to perform the necessary tasks (for example, a computer program product) can be stored in a computer-readable or machine-readable medium . A processor (s) can perform the necessary tasks.
[0049] Figure 1 is a block diagram illustrating an example of a system 100 including an encoding device 104 and a decoding device 112. The encoding device 104 can be part of a source device, and the encryption device decoding 112 may be part of a receiving device. The source device and / or the receiving device may include an electronic device, such as a mobile or landline (e.g., smart phone,
Petition 870190092767, of 9/17/2019, p. 32/141
29/109 cell phone or similar), a desktop computer, a notebook or laptop, a tablet, a set-top box, a television, a camera, a display device, a digital media player, a video game console, a device streaming video or any other suitable electronic device. In some examples, the source device and the receiving device may include one or more wireless transmitters for wireless communications. The encoding techniques described in this document are applicable to video encoding in various multimedia applications, including streaming video streams (for example, over the Internet), broadcasts or television broadcasts, digital video encoding for storage on a storage medium. data, decoding digital video stored on a data storage medium, or other applications. In some instances, system 100 may support one-way or two-way video transmission to support applications such as video conferencing, video streaming, video playback, video broadcasting and / or video telephony.
[0050] The encoding device 104 (or encoder) can be used to encode video data, including virtual reality video data, using a video encoding standard or protocol to generate an encoded video bit stream. Video encoding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC), including their
Petition 870190092767, of 9/17/2019, p. 33/141
30/109 scalable video coding and multiview video coding, known as SVC and MVC, respectively. A more recent video coding standard, High Efficiency Video Coding (HEVC), has been finalized by the Joint Video Coding Collaboration Team (JCT-VC) of the ITU-T (VCEG) Video Coding Expert Group and ISO / IEC Moving Image Experts Group (MPEG). Several HEVC extensions deal with multi-layer video encoding and are also being developed by JCT-VC, including the multi-view HEVC extension, called MV-HEVC, and the scalable HEVC extension, called SHVC, or any other suitable encoding protocol.
[0051] The implementations described here describe examples of using the HEVC standard or its extensions. However, the techniques and systems described here may also be applicable to other coding standards such as, for example, AVC, MPEG, their extensions, or other suitable coding standards already available or not yet available or developed. Therefore, while the techniques and systems described herein can be described with reference to a particular video encoding standard, one skilled in the art will understand that the description should not be interpreted as applying only to that particular standard.
[0052] A video source 102 can provide the video data to the encoding device 104. The video source 102 can be part of the source device, or it can be part of a device other than the source device. Video source 102 can include a video capture device (for example, a video camera, a
Petition 870190092767, of 9/17/2019, p. 34/141
31/109 camera phone, videophone or similar), a video file containing stored video, a video server or content provider that offers video data, a video feed interface that receives video from a video server or content provider, a computer graphics system for generating computer graphics data, a combination of these sources, or any other suitable video source. An example of a video source 102 may include an Internet protocol camera (IP camera). An IP camera is a type of digital video camera that can be used for surveillance, home security or another suitable application. Unlike analog CCTV cameras, an IP camera can send and receive data over a computer network and the Internet.
[0053] The video data of the video source 102 may include one or more images or input frames. A photo or frame is a still image that is part of a video. The encoding engine 106 (or encoder) of the encoding device 104 encodes the video data to generate an encoded video bit stream. In some examples, an encoded video bit stream (or video bit stream or bit stream) is a series of one or more encoded video streams. A coded video sequence (CVS) includes a series of access units (AUs) starting with an AU that has a random access point photo in the base layer and with certain properties up to and not including a next AU that has a photo random access point in the base layer and with certain properties. For example, certain
Petition 870190092767, of 9/17/2019, p. 35/141
32/109 properties of a random access point image that initiates a CVS may include a RASL flag (for example, NoRaslOutputFlag) equal to 1. Otherwise, a random access point image (with RASL flag equal to 0) does not starts a CVS. An access unit (AU) includes one or more encoded images and control information corresponding to the encoded images that share the same output time. Coded slices of images are encapsulated at the bitstream level in data units called network abstraction layer (NAL) units. For example, a HEVC video bit stream can include one or more CVSs including NAL units. Two classes of NAL units exist in the HEVC standard, including NAL video encoding layer (VCL) units and non-VCL NAL units. An NAL VCL unit includes a slice or slice segment (described below) of encoded image data, and a non-VCL NAL unit includes control information that pertains to one or more encoded images.
[0054] NAL units may contain a sequence of bits that form an encoded representation of video data (for example, an encoded video bit stream, a CVS of a bit stream or the like), such as encoded representations of images in a video. The encoding engine 106 generates encoded representations of images by segmenting each photo into several slices. The slices are then segmented into tree coding blocks (CTBs) of luminance samples and chrominance samples. A CTB of luminance samples and one or more CTBs of chrominance samples, together with the syntax for the
Petition 870190092767, of 9/17/2019, p. 36/141
33/109 samples, are referred to as a tree coding unit (CTU). A CTU is the basic processing unit for HEVC coding. A CTU can be divided into several coding units (CUs) of different sizes. A CU contains matrices of chrominance and luminance samples that are referred to as coding blocks (CBs).
[0055] The chrominance and luminance CBs can also be divided into prediction blocks (PBs). A PB is a sample block of the luminance component or a chrominance component that uses the same motion parameters for interpretation. The luminance PB and one or more chrominance PBs, together with the associated syntax, form a prediction unit (PU). A set of motion parameters is signaled in the bit stream for each PU and is used to interpret the luminance PB and the one or more chrominance PBs. A CB can also be segmented into one or more transform blocks (TBs). A TB represents a square block of samples of a color component to which the same two-dimensional transform is applied to encode a residual prediction signal. A transform unit (TU) represents the TBs of chrominance and luminance samples, and the corresponding syntax elements.
[0056] One size of a CU stands for one size of the knot coding and can be square. Per example, a size of a CU can be of 8 x 8 samples, 16 x 16 samples, 32 x 32 samples, 64 x 64 samples, or any other suitable size up the size of CTU corresponding. The phrase N x N is used here to if
Petition 870190092767, of 9/17/2019, p. 37/141
34/109 refer to the pixel dimensions of a video unit in terms of vertical and horizontal dimensions (for example, 8 x 8 pixels). The pixels in a block can be organized into rows and columns. In some embodiments, blocks may not have the same number of pixels in a horizontal direction as they are in a vertical direction. Syntax data associated with a CU can describe, for example, the segmentation of the CU into one or more PUs. Segmentation modes may differ depending on whether the CU is encoded in intraprediction mode or encoded in interpredition mode. PUs can be segmented so that they are not square. Syntax data associated with a CU can also describe, for example, the segmentation of the CU into one or more TUs according to a CTU. A TU can be square or non-square.
[0057] According to the HEVC standard, transformations can be performed using transform units (TU). TUs can vary for different CUs. TUs can be sized based on the size of the PUs within a given CU. TUs can be the same size or smaller than PUs. In some examples, residual samples corresponding to a CU can be subdivided into smaller units using a quadtree structure known as a residual quadtree (RQT). RQT leaf nodes can correspond to TUs. The pixel difference values associated with the TUs can be transformed to produce transform coefficients. The transform coefficients can then be quantized by the encoding motor 106.
[0058] Since the images of the video data
Petition 870190092767, of 9/17/2019, p. 38/141
35/109 are segmented into CUs, encoder engine 106 predicts each PU using a prediction mode. The prediction is then subtracted from the original video data to obtain residual data (described below). For each CU, a prediction mode can be signaled within the bit stream using syntax data. A prediction mode can include intraprediction (or intraimage prediction) or interpredition (or interimage prediction). Using intraprediction, each PU is predicted from data from the neighboring image in the same photo using, for example, DC prediction to find an average value for the PU, planar prediction to fit a planar surface to the PU, prediction of direction to extrapolate from neighboring data, or any other type of prediction. Using interpretation, each PU is predicted using motion compensation prediction from image data in one or more reference images (before or after the current photo in order of output). The decision regarding the coding of a photo area using interimage or intraimage prediction can be made, for example, at the CU level. In some examples, one or more slices of a photo are assigned to a slice type. The slice types include a slice I, a slice P and a slice B. A slice I (intraframe, which can be decoded independently) is a slice of an image that is only encoded by intraprediction and therefore can be decoded independently since slice I requires only the data within the frame to predict any slice block. A P slice (unidirectional predicted frames) is a slice of an image that can be encoded with intraprediction and unidirectional interpretation. Each block
Petition 870190092767, of 9/17/2019, p. 39/141
36/109 within a P slice is encoded with intraprediction or interpredition. When interpretation is applied, the block is only predicted by a reference image and, therefore, reference samples are only from a reference region of a frame. A B slice (unidirectional predictive frames) is a slice of a photo that can be coded with intraprediction and interpretation. A block of a slice B can be bidirectional, predicted from two reference images, in which each photo contributes a reference region and sets of samples from the two reference regions are weighted (for example, with equal weights) to produce the prediction signal of the predicted bidirectional block. As explained above, the slices of a photo are independently coded. In some cases, a photo can be encoded as just a slice.
[0059] A PU can include data related to the prediction process. For example, when the PU is encoded using intraprediction, the PU can include data that describes an intraprediction mode for the PU. For example, when the PU is encoded using interpredition, the PU can include data that defines a motion vector for the PU. The data that defines the motion vector for a PU can describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution for the motion vector (for example, quarter pixel precision or precision of an eighth of a pixel), a reference photo to which the motion vector points and / or a list of reference images (for example, List 0, List 1 or List C) for the motion vector.
Petition 870190092767, of 9/17/2019, p. 40/141
37/109 [0060] The coding device 104 can then perform the transformation and quantization. For example, after prediction, encoder engine 106 can calculate residual values corresponding to PU. Residual values can comprise pixel difference values. Any residual data that may remain after performing the prediction are transformed using a transform block, which can be based on a discrete cosine transform, a discrete sine transform, an integer transform, a wavelet transform or other suitable transform function. In some cases, one or more blocks transformed (for example, 32 x 32, 16 x 16, 8 x 8, 4 x 4 or similar sizes) can be applied to residual data in each CU. In some embodiments, a TU can be used for transform and quantization processes implemented by the encoding engine 106. A given CU with one or more PUs can also include one or more TUs. As described in more detail below, residual values can be transformed into transform coefficients using the block transforms and then can be quantized and digitized using TUs to produce serial transform coefficients for entropy coding.
[0061] In some embodiments after intrapredictive and interpretive encoding using PUs from a CU, encoder engine 106 can calculate residual data for CU's TUs. PUs can include pixel data in the spatial domain (or pixel domain). TUs can include coefficients in the transform domain after applying a transform block. As mentioned
Petition 870190092767, of 9/17/2019, p. 41/141
38/109 previously, the residual data may correspond to pixel difference values between pixels in the uncoded photo and prediction values corresponding to PUs. Encoder motor 106 can form TUs including residual data for CU, and can then transform TUs to produce transform coefficients for CU.
[0062] The encoding motor 106 can perform the quantization of the transform coefficients. Quantization provides additional compression by quantizing the transform coefficients to reduce the amount of data used to represent the coefficients. For example, quantization can reduce the bit depth associated with some or all of the coefficients. In one example, a coefficient with a value of n bits can be rounded to a value of m bits during quantization, with n being greater than m.
[0063] Once quantization is performed, the bitstream of encoded video includes quantized transform coefficients, prediction information (e.g., prediction modes, motion vectors or the like), segmentation information and any other suitable data , like other syntax data. The different elements of the encoded video bit stream can then be entropy encoded by the encoder motor 106. In some instances, the encoder motor 106 may use a predefined scan order to digitize the quantized transform coefficients to produce a serialized vector that can be encoded with entropy. In some instances, encoder engine 106 may perform adaptive scanning. After
Petition 870190092767, of 9/17/2019, p. 42/141
39/109 digitalization of the quantized transform coefficients to form a vector (for example, a one-dimensional vector), the encoding motor 106 can encode the vector with entropy. For example, encoder engine 106 may use context-adaptive variable-length encoding, context-adaptive binary arithmetic, syntax-based context-adaptive binary arithmetic, probability interval segmentation entropy encoding, or other suitable encoding technique with entropy.
[0064] The output 110 of the encoding device 104 can send the NAL units that make up the encoded video bit stream data via the communications link 120 to the decoding device 112 of the receiving device. The input 114 of the decoding device 112 can receive the NAL units. Communications link 120 may include a channel provided by a wireless network, a wired network, or a combination of a wireless and wired network. A wireless network can include any wireless interface or combination of wireless interfaces and can include any suitable wireless network (for example, the Internet or other wide area network, a packet-based network, Wi-Fi ™, radio frequency ( RF), UWB, WiFi-Direct, cellular, Long Term Evolution (LTE), WiMax ™ or similar). A wired network can include any wired interface (for example, fiber, Ethernet, Powerline Ethernet, ethernet via coaxial cable, digital signal line (DSL) or similar). Wireless and / or wired networks can be implemented using a variety of equipment, such as base stations, routers, access points, bridges,
Petition 870190092767, of 9/17/2019, p. 43/141
40/109 gateways, switches or similar. The encoded video bit stream data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the receiving device.
[0065] In some examples, the encoding device 104 can store data from the encoded video bit stream in storage 108. Output 110 can retrieve the data from the encoded video bit stream from encoder engine 106 or from storage 108. The storage 108 can include any of a variety of locally accessible or distributed data storage media. For example, storage 108 may include a hard disk, a storage disk, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
[0066] The input 114 of the decoding device 112 receives the data from the encoded video bit stream and can supply the data from the video bit stream to the decoder motor 116, or to the storage 118 for later use by the decoder motor 116. 0 decoder engine 116 can decode the encoded video bit stream data through entropy decoding (e.g., using an entropy decoder) and extracting the elements from one or more encoded video sequences that make up the encoded video data. The decoder motor 116 can then resize and perform an inverse transformation as to the encoded data of the video bit stream. Residual data is then passed to an engine prediction phase
Petition 870190092767, of 9/17/2019, p. 44/141
41/109 decoder 116. Decoder 116 then predicts a block of pixels (for example, a PU). In some examples, the prediction is added to the output of the inverse transform (the residual data).
[0067] The decoding device 112 may output the decoded video to a video destination device 122, which may include a monitor or other output device for displaying the decoded video data to a consumer of the content. In some aspects, the video destination device 122 may be part of the receiving device that includes the decoding device 112. In some aspects, the video destination device 122 may be part of a separate device other than the receiving device.
[0068] Supplemental Enhancement Information (SEI) messages can be included in video bit streams. For example, SEI messages can be used to carry information (for example, metadata) that is not essential for decoding the bit stream by the decoding device 112. This information is useful for improving the display or processing of the decoded output ( for example, this information could be used by entities at the decoder level to improve the ability to view content).
[0069] In some embodiments, the video encoding device 104 and / or the video decoding device 112 can be integrated with an audio encoding device and audio decoding device, respectively. The video encoding device 104 and / or the
Petition 870190092767, of 9/17/2019, p. 45/141
42/109 video decoding 112 may also include other hardware or software that is necessary to implement the encoding techniques described above, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), arrays field programmable door switches (FPGAs), discrete logic, software, hardware, firmware or their combinations. Video encoding device 104 and video decoding device 112 can be integrated as part of an encoder / decoder combination (codec) in a respective device.
[0070] Extensions to the HEVC standard include the Multivista Video Encoding extension, referred to as MVHEVC, and the Scalable Video Encoding extension, referred to as SHVC. The MV-HEVC and SHVC extensions share the concept of layered encoding, with different layers being included in the encoded video bitstream. Each level in an encoded video sequence is handled by a unique layer identifier (ID). A layer ID can be present in a header of an NAL unit to identify a layer with which the NAL unit is associated. In MV-HEVC, different layers can represent different views of the same scene in the video bit stream. In SHVC, different scalable layers are provided that represent the video bit stream in different spatial resolutions (or image resolution) or in different reconstruction fidelities. Scalable layers can include a base layer (with layer ID = 0) and one or more enhancement layers (with layer IDs = 1, 2, ..., n). The base layer can
Petition 870190092767, of 9/17/2019, p. 46/141
43/109 correspond to a profile of the first version of HEVC, and represents the lowest layer available in a bit stream. The enhancement layers have higher spatial resolution, temporal resolution or frame rate and / or reconstruction (or quality) fidelity, compared to the base layer. The enhancement layers are organized hierarchically and may (or may not) depend on lower layers. In some examples, different layers can be encoded using a single standard codec (for example, all layers are encoded using HEVC, SHVC or another encoding standard). In some examples, different layers can be encoded using a multi-standard codec. For example, a base layer can be encoded using AVC, while one or more enhancement layers can be encoded using the SHVC and / or MV-HEVC extensions for the HEVC standard. In general, a layer includes a set of NAL VCL units and a corresponding set of non-VCL NAL units. NAL units are assigned a certain layer ID value. The layers can be hierarchical in the sense that a layer can depend on the bottom layer.
[0071] As described earlier, a HEVC bit stream includes a group of NAL units, including NAL VCL units and non-VCL NAL units. Non-VCL NAL units can contain sets of parameters with high level information related to the encoded video bitstream, in addition to other information. For example, a parameter set can include a video parameter set (VPS), a sequence parameter set (SPS) and an image parameter set
Petition 870190092767, of 9/17/2019, p. 47/141
44/109 (PPS). Examples of parameter set goals include bit rate efficiency, error resilience and the provision of layer interfaces for systems. Each slice references a single active PPS, SPS and VPS to access information that the decoding device 112 can use to decode the slice. An identifier (ID) can be encoded for each set of parameters, including an ID for the VPS, an ID for the SPS and an ID for the PPS. An SPS includes an ID for the SPS and an ID for the VPS. A PPS includes an ID for the PPS and an IF for the SPS. Each slice header includes an ID for the PPS. Using the IDs, sets of active parameters can be identified for a given slice.
[0072] NAL VCL units include encoded image data that forms the encoded video bit stream. Various types of NAL VCL units are defined in the HEVC standard, as illustrated in Table A below. In a single layer bit stream, as defined in the first HEVC standard, the NAL VCL units contained in an AU have the same value for the NAL unit type, with the value for the NAL unit type defining the AU type and the type of image encoded within the AU. For example, the NAL VCL units of a given AU may include NAL units of the instant decoding update (IDR) (value 19), which make the AU an AU of the IDR and the encoded image of the AU an IDR image. The given type of an NAL VCL unit is related to the image, or part of it, contained in the NAL VCL unit (for example, a slice or slice segment of an image in an NAL VCL unit). Three classes of images are defined in the HEVC standard, including initial images,
Petition 870190092767, of 9/17/2019, p. 48/141
45/109 final images and intra random access (TRAP) images (also known as random access images). In a multilayer bit stream, the NAL VCL units of an image within an AU have the same value for the type of NAL unit and the same type of encoded image. For example, the image containing the NAL VCL units of the IDR type is said to be an IDR image in the AU. In another example, when an AU contains an image that is an IRAP image in the base layer (the layer ID is 0), the AU is an AU IRAP.
[0073] An encoded video bit stream as discussed above can be recorded or condensed into one or more files in order to transfer the bit stream from the encoding device 104 to the decoding device 112. For example, output 110 can include a file registration engine, configured to generate one or more files that contain the bit stream. Output 110 can transmit the one or more files via communications link 120 to the decoder device 112. Alternatively or additionally, the one or more files can be stored on a storage medium (for example, a tape, a magnetic disk or a hard disk drive, or any other means) for later transmission to the decoding device 112.
[0074] The decoder device 112 may include, for example, input 114, a file analysis engine. The file analysis engine can read files received via communication link 120 or from a storage medium. The file analysis engine can also extract samples from the file, and
Petition 870190092767, of 9/17/2019, p. 49/141
46/109 reconstructing the bit stream for decoding by decoder engine 116. In some cases, the reconstructed bit stream may be the same bit stream generated by encoder engine 106. In some cases, encoder engine 106 may have generated the stream bits with several possible options for decoding the bit stream, in which case the reconstructed bit stream may include only one or less than all possible options.
[0075] An encoded video bit stream, as discussed above, can be recorded or condensed into one or more files using ISOBMFF, a file format derived from ISOBMFF, some other file format and / or a combination of file formats file, including the ISOBMFF. The file or files can be played using a video player device, can be transmitted and then displayed and / or stored.
[0076] Figure 2 illustrates an example of an ISO 200 based media file that contains data and metadata for a video presentation, formatted according to the ISOBMFF. ISOBMFF is designed to contain timed media information in a flexible and extensible format that facilitates the exchange, management, editing and presentation of the media. The presentation of the material can be local to the system that contains the presentation or the presentation can be through a network or other flow delivery mechanism.
[0077] A presentation, as defined by the ISOBMFF specification, is a sequence of images, often related because they were captured sequentially by a video capture device, or
Petition 870190092767, of 9/17/2019, p. 50/141
47/109 related for some other reason. Here, a presentation can also be referred to as a movie or a video presentation. A presentation can include audio. A single presentation can be contained in one or more files, with a file containing the metadata for the entire presentation. Metadata includes information such as time and framing data, descriptors, pointers, parameters and other information that describe the presentation. Metadata does not include the video and / or audio data itself. Files other than the file containing the metadata do not need to be formatted according to the ISOBMFF, and only need to be formatted so that these files can be referenced by the metadata.
[0078] The file structure of an ISO-based media file is object oriented, and the structure of an individual object in the file can be inferred directly from the type of object. Objects in an ISO-based media file are referred to as boxes by the ISOBMFF specification. An ISO-based media file is structured as a sequence of boxes, which can contain other boxes. Boxes, in general, include a header that provides a size and type for the box. The size describes the entire size of the box, including the header, fields, and all boxes contained within the box. Boxes with a type that is not recognized by a reader device are usually ignored and skipped.
[0079] As illustrated in the example in figure 2, at the top level of the file, an ISO 200-based media file can include a file type box 210, a film box 220 and one or more boxes of fragments of
Petition 870190092767, of 9/17/2019, p. 51/141
48/109 films 230a, 230n. Other boxes that can be included at this level, but which are not represented in this example, include free space boxes, metadata boxes and media data boxes, among others.
[0080] An ISO-based media file can include a 210 file type box, identified by the ftyp box type. The file type box 210 identifies an ISOBMFF specification that is most suitable for analyzing the file. More appropriate, in this case, means that the ISO 200 base media file may have been formatted according to a special ISOBMFF specification, but is probably compatible with other iterations of the specification. This most suitable specification is referred to as the main brand. A reader device can use the main tag to determine whether the device is capable of decoding and displaying the contents of the file. The 210 file type box may also include a version number, which can be used to indicate a version of the ISOBMFF specification. The file type box 210 can also include a list of compatible brands, which includes a list of other brands with which the file is compatible. An ISO-based media file can be compatible with more than one major brand.
[0081] When an ISO 200 based media file includes a file type box 210, there is only one file type box. An ISO 200 based media file may omit the file type box 210 to be compatible with older reader devices. When an ISO 200-based media file does not include a file type box 210, a reader device can assume a
Petition 870190092767, of 9/17/2019, p. 52/141
49/109 large standard brand (for example, Mp41), smaller version (for example, 0), and compatible brand (for example, mp41). The file type box 210 is normally placed as early as possible in the ISO 200 base media file.
[0082] An ISO-based media file can also include a 220 film box, which contains the metadata for the presentation. Film box 220 is identified by the type of moov box. ISO / IEC 14496-12 provides that a presentation, whether contained in one file or several files, can include only one 220 film box. Often, the 220 film box is close to the beginning of an ISO-based media file. Film box 220 includes a film header box 222, and may include one or more boxes of track 224, as well as other boxes.
[0083] The film header box 222, identified by the type of box mvhd, may include information that is independent of the media and relevant to the presentation as a whole. For example, the film header box 222 can include information, such as a creation time, a modification time, a deadline and / or a duration for the presentation, among other things. The film header box 222 can also include an identifier that identifies the next track in the presentation. For example, the identifier can point to track box 224 contained by film box 220 in the illustrated example.
[0084] Track box 224, identified by the type of trak box, can contain information for a
Petition 870190092767, of 9/17/2019, p. 53/141
50/109 track for a presentation. A presentation can include one or more tracks, where each track is independent of other tracks in the presentation. Each track can include temporal and spatial information that is specific to the content on the track, and each track can be associated with a media box. The data on a track can be media data, in which case the track is a media track, or the data can be packaging information for broadcast protocols, in which case the track is a suggestion track. Media data includes, for example, video and audio data. In the illustrated example, the exemplary track box 224 includes a track header box 224a and a media box 224b. The track box can include other boxes, such as a track reference box, a track group box, an edit box, a user data box, a goal box, and others.
[0085] The track header box 224a, identified by the type of tkhd box, can specify the characteristics of a track contained in track box 224. For example, the track header box 224a can include a creation time, time modification, duration, track identifier, layer identifier, group identifier, volume, width and / or height of the track, among other things. For a media track, the track header box 224a can also identify whether the track is enabled, whether the track should be played as part of the presentation, or whether the track can be used to view the presentation, among other things. The presentation of a track is assumed to be at the beginning of a presentation. Track box 224 can include a
Petition 870190092767, of 9/17/2019, p. 54/141
51/109 edit list box, not shown here, which may include an explicit timeline map. The timeline map can specify, among other things, a travel time for the track, where the shift indicates a start time, after the start of the presentation, for the track.
[0086] In the illustrated example, track box 224 also includes a media box 224b, identified by the type of medium box. The media box 224b can contain objects and information about the media data on the track. For example, media box 224b may contain a manipulator reference box, which can identify the media type of the track and the process by which the media is presented on the track. As another example, media box 224b can contain a media information box, which can specify the characteristics of the media on the track. The media information box can also include a sample table, where each sample describes a block of media data (for example, audio or video data), including, for example, the location of the data for the sample. The data for a sample is stored in a media data box, discussed later. As with most other boxes, the media box 224b can also include a media header box.
[0087] In the illustrated example, the exemplary ISO 200 media file also includes several fragments 230a, 230b, 230c, 230n from the presentation. Fragments 230a, 230b, 203c, 230n are not ISOBMFF boxes, but describe a film fragment box 232 and media data box 238 that is referenced by the film fragment box
Petition 870190092767, of 9/17/2019, p. 55/141
52/109
232. Film fragment box 232 and media data boxes 238 are high-level boxes, but are grouped here to indicate the relationship between a film fragment box 232 and a media data box 238.
[0088] A film fragment box 232, identified by the type of Moof box, can extend a presentation by including additional information that would otherwise be stored in film box 220. Using film fragment boxes 232, a presentation can be built incrementally. The film fragment box 232 can include a film fragment header box 234 and a track fragment box 236, as well as other boxes not illustrated here.
[0089] The film fragment header box 234, identified by the box type mfhd, can include a sequence number. A reader device can use the sequence number to verify that fragment 230a includes the next data for the presentation. In some cases, the contents of a file, or files for a presentation, may be provided to an out-of-order reader device. For example, network packets can often arrive in a different order than the order in which the packets were originally transmitted. In such cases, the sequence number can assist a reader device in determining the correct order for the fragments.
[0090] The film fragment box 232 can also include one or more boxes of track fragments 236, identified by the type of traf box. The film fragment box 232 can include a set of track fragments, zero or more per track. The
Petition 870190092767, of 9/17/2019, p. 56/141
53/109 track fragments can contain zero or more track periods, each describing a contiguous sample period for a track. Track fragments can be used to add empty time to a track, in addition to adding samples to the track.
[0091] Media data box 238, identified by type of mdat box contains media data. On video tracks, media data box 238 contains video frames. A media data box, alternatively or additionally, includes audio frames. A presentation can include zero or more media data boxes, contained in one or more individual files. Media data is described by metadata. In the illustrated example, the media data in the media data box 238 can be described by the metadata included in the track fragment box 236. In other examples, the media data in a media data box can be described by the metadata in the film box 220. Metadata can refer to specific media data by an absolute offset within file 200, such that a media data header and / or free space within media data box 238 can be ignored .
[0092] Other fragments 230b, 230c, 230n in the ISO 200 base media file may contain boxes similar to those illustrated for the first fragment 230a, and / or may contain other boxes.
[0093] The ISOBMFF includes support for streaming video data over a network, in addition to supporting local media playback. The file or files that includes a movie presentation can include additional tracks,
Petition 870190092767, of 9/17/2019, p. 57/141
54/109 called suggestion tracks, which contain instructions that can assist a streaming server in the formation and transmission of the file or files as packages. These instructions may include, for example, data for the server to send (for example, header information) or references to media data segments. A file can include separate suggestion tracks for different streaming protocols. Suggestion tracks can also be added to a file, without the need to reformat the file.
[0094] A method of streaming media data is Dynamic Adaptive Streaming over the Hypertext Transfer Protocol (HTTP), or DASH (defined in ISO / IEC 23009-1: 2014). DASH, which is also known as MPEG-DASH, is an adaptive bit rate streaming technique that allows high quality streaming of media content using conventional HTTP network servers. DASH operates by dividing media content into a sequence of small HTTP-based file segments, where each segment contains a short time span of the content. Using DASH, a server can deliver media content at different bit rates. A client device that is playing the media can select between alternative bit rates when downloading a next segment and thus adapt to changing network conditions. DASH uses the Internet's HTTP network server infrastructure to deliver content over the World Wide Web. DASH is independent of the codec used to encode and decode media content and therefore operates with codecs as
Petition 870190092767, of 9/17/2019, p. 58/141
55/109
Η.264 and HEVC, among others.
[0095] The ISOBMFF specification specifies six types of Access Points for Stream (SAPs) for use with DASH. The first two types of SAP (types 1 and 2) correspond to images from the instant decode update (IDR) in H.264 / AVC and HEVC. For example, an IDR image is an intraimage (I-image) that updates or completely reinitializes the decoding process in the decoder and starts a new encoded video sequence. In some instances, an IDR image and any image after the IDR image in decode order cannot be dependent on any image that comes before the IDR image in decode order.
[0096] The third type of SAP (type 3) corresponds to random access points of the open GOP (Group of images), therefore, access images via corrupted link (BLA) or clean random access (CRA) in HEVC. For example, a CRA image is also an I-image. A CRA image may not update the decoder and may not start a new CVS, allowing the initial images of the CRA image to depend on images that come before the CRA image in the decoding order. Random access can be done on a CRA image by decoding the CRA image, the initial images associated with the CRA image that are not dependent on any image coming before the CRA image in decoding order, and all associated images following the CRA both in decoding and output order. In some cases, a CRA image may not have associated initial images. In some embodiments, in the case of multilayers, an IDR or CRA image
Petition 870190092767, of 9/17/2019, p. 59/141
56/109 that belongs to a layer with layer ID greater than 0 can be a P-image or a B-image, but those images can only use the interlayer prediction of other images that belong to the same access unit as the IDR image or CRA, and that have a layer ID lower than the layer that contains the IDR or CRA image.
[0097] The fourth type of SAP (type 4) corresponds to random access points of the gradual decoding update (RDA).
[0098] ISOBMFF, although flexible and extensible and widely used to store and transmit various types of media, does not include mechanisms for storing virtual reality video or identifying the contents of an ISO-based media file as including virtual reality content. Reader devices may then not be able to determine that the contents of a file include virtual reality video. Reader devices that are not capable of displaying virtual reality content may attempt to display the content in any way, resulting in a distorted presentation.
[0099] In several deployments, the ISOBMFF and / or file formats derived from the ISOBMFF can be modified and / or extended so that the virtual reality content can be identified. These implementations can involve boxes, brand values, bits reserved in a box and / or other indicators that can, each independently or in combination, identify the virtual reality content.
[00100] Figure 3A and figure 3B illustrate examples where a top level box in a file
Petition 870190092767, of 9/17/2019, p. 60/141
57/109 ISO 300 based media is used to indicate that the 300 file includes virtual reality content. In many deployments, the use of a top-level box indicates that all of the content in the 300 file is virtual reality content. The file 300 may include a file box of type 310, which may specify the particular ISOBMFF tag (s) or iterations or derivations of the ISOBMFF with which the file 300 is compatible. The 300 file can also include a movie box 320, which can contain the metadata for a presentation. The 300 file can also optionally include one or more fragments 330a, 330b,
330c, 330n, as discussed above.
[00101] In the example in figure 3, the file type box 310 can be used to indicate that file 300 includes virtual reality content. The 310 file type box can be used, for example, to specify a tag value that indicates that the file is compatible with a virtual reality tag. In various deployments, compatible brands listed in the 310 file type box can also be used to provide optional brand indicators, which can be used to provide parameters related to virtual reality. For example, a compatible brand value may indicate that the virtual reality content is two-dimensional (2-D), while another compatible brand value may indicate that the virtual reality content is three-dimensional (3-D). As another example, compatible brand values can be used to indicate a type of mapping; that is, if the spherical representation of the virtual reality video was mapped to an equirectangular, cube, pyramid or
Petition 870190092767, of 9/17/2019, p. 61/141
58/109 some other format for storage in the 300 file. In several deployments, information such as dimensionality and / or mapping of the video can be alternatively or additionally indicated using optional fields in the 310 file type box.
[00102] In the example of figure 3B, a new type of box 360 has been defined. The new type of box 360 is an upper level box, similar to the box of file type 310. The presence of the new type of box 360 in the file, and / or indicators in the new type of box 360 can be used to indicate that the 300 file includes virtual reality content. For example, the new box type 360 may specify a brand value compatible with virtual reality, and / or include brand value compatible with virtual reality content in a list of compatible brands. The new type of box 360 can also include optional parameters that can indicate, for example, whether the virtual reality content is 2D or 3D and / or a mapping to the virtual reality data stored in the 300 file. The specification of the new type of box 360 can avoid the need to modify the file type box 310, as in the example in figure 3A. Reader devices that do not recognize the new box type 360 may ignore it.
[00103] When the 310 file type box or a new 360 type box set to the top level of the file is used to indicate that the 300 file includes virtual reality content, in some deployments, the 300 file may also not need include indicators in other boxes in the 300 file to signal the presence of virtual reality content.
Petition 870190092767, of 9/17/2019, p. 62/141
59/109 [00104] Figure 4 illustrates an example where a film level indication is used in an ISO 400 based media file to indicate that the 400 file includes virtual reality content. The 400 file can include a file box of type 410, which can specify the particular ISOBMFF tag (s) or iterations or derivations of the ISOBMFF with which the 400 file is compatible. The 400 file can also include a film box 420, which can contain the metadata for a presentation. The 400 file can also optionally include one or more fragments 430a, 430b, 430c, 430n, as discussed above.
[00105] As discussed above, film box 420 may include a film header box 422 and, optionally, one or more track boxes 424. In the example in figure 4, film header box 422 is used to indicate that the film or presentation described by the film box 420 includes virtual reality content. For example, a reserved bit in the movie header box 422, when set to one value, can indicate that the movie's content is virtual reality video, and can be set to another value when the movie can be virtual reality video or not . In an illustrative example, if one of the reserved bits is used to carry the indication, the bit equal to 1 indicates that the content is virtual reality video content, and the bit equal to 0 indicates that the content may or may not be content of virtual reality video. Reader devices that are not configured to process reserved bits can ignore these bits.
[00106] Other fields and / or bits reserved in the movie header box 422 can be used to
Petition 870190092767, of 9/17/2019, p. 63/141
60/109 provide optional parameters that pertain to virtual reality content. For example, the movie header box 422 can include a parameter that indicates whether the virtual reality content is 2D or 3D.
[00107] As another example, the movie header box 422 can include a parameter that indicates whether the virtual reality content is pre-stitched or post-stitched. Pre-stitched means that the different views captured for the virtual reality presentation were brought together into a single representation before being stored in the 400 file. Post-spliced means that the different views were stored individually in the 400 file, and will be brought together in a single representation by a decoder device.
[00108] The pre-spliced virtual reality video is usually represented as a spherical, and is mapped to another shape (for example, equirectangular, mapped in a cube, mapped in a pyramid or some other form) that is more convenient for storage. The parameters that indicate the type of mapping used are another example of parameters that can be flagged in the film header box 422, for example, using reserved bits. For example, a reserved bit can be used to transmit each indication of the mapping type. In many deployments, a reader device can support several types of mapping. In these deployments, the film header box 422 can include a type of mapping for each individual track and / or for groups of tracks.
[00109] When the film header box 422 is used to indicate that the stored film presentation
Petition 870190092767, of 9/17/2019, p. 64/141
61/109 in the film box 420 includes virtual reality video, in several deployments, other boxes in the film box 420 may also not need to signal the presence of virtual reality video.
[00110] Figure 5 illustrates an example where a track level indicator is used in an ISO 500 based media file to indicate that the 500 file includes virtual reality content. The 500 file can include a 510 file type box, which can specify the particular ISOBMFF tag (s) or iterations or derivations of the ISOBMFF with which the 500 file is compatible. The 500 file can also include a 520 film box, which can contain metadata for a presentation. The 500 file can also optionally include one or more fragments 530a, 530b, 530c, 530n, as discussed above.
[00111] Film box 520 includes a film header box 522 and one or more boxes of tracks 524, as well as other boxes not illustrated here. The film header box 522 may include information that describes the presentation as a whole. Track box 524 can include information for a track in the presentation. Track box 524 can include a track header box 524a and zero or more media data boxes 524b.
[00112] In the example in figure 5, the track header box 524a for a given track box 524 is used to indicate that the track described by track box 524 is a virtual reality track, meaning that the samples referred to by the track include virtual reality video data. The content of
Petition 870190092767, of 9/17/2019, p. 65/141
62/109 virtual reality on the track can be indicated, for example, using reserved bits in the track header box 524a. For example, when a particular reserved bit is set to one value, the track includes virtual reality content, and when the bit is set to another value, the track may or may not include virtual reality content. In an illustrative example, if one of the reserved bits is used to carry the indication, the bit equal to 1 indicates that the content is virtual reality video content, and the bit equal to 0 indicates that the content may or may not be content of virtual reality. virtual reality video. In some deployments, the flagging of virtual reality content in the track header box 524 may depend on what is flagged in the movie header box 522. For example, when the movie header box 622 indicates that the movie does not include content virtual reality, so any indication in the track header box 524a that the track contains virtual reality data can be ignored.
[00113] In several deployments, other parameters related to virtual reality can also be signaled in the trail header box 524a. For example, a reserved bit or some other variable can be used to indicate whether the virtual reality video on the track is pre-spliced or post-spliced. When the video on the track is pre-spliced, additional parameters can provide information such as a camera position (for example, with respect to a point of view and / or angle of view). When the video on the track is post-spliced, additional parameters can provide a type of mapping between the representation of
Petition 870190092767, of 9/17/2019, p. 66/141
63/109 spherical video and the representation (for example, equirectangular, cube mapping, pyramid mapping or some other form) used to store the data in the 500 file.
[00114] When the track header box 524a for a track box 524 is used to signal that the track includes virtual reality content, in some deployments, other boxes in the track box 524 may not need to signal the presence of content as well of virtual reality on the trail.
[00115] In several deployments, techniques similar to those described above can be used to indicate virtual reality content in a file transmitted using DASH. For example, virtual reality content can be flagged at the media presentation level of a DASH presentation. A presentation material, as defined by the DASH specification, is a collection of data for a limited or unlimited media presentation (for example, a single film or live stream, among other examples). A media presentation can be described by a media presentation description, a document contains metadata that can be used by a DASH client to build the appropriate HTTP uniform resource locators (URLs) to access segments of the media presentation.
[00116] In various deployments, the media presentation description can be used to indicate that the media content described by the media presentation description includes virtual reality content. For example, an element can be modified or added to the scheme for
Petition 870190092767, of 9/17/2019, p. 67/141
64/109 description of media presentation, where the element then signals the virtual reality content. In many deployments, attributes can also be modified or added to the media presentation description to provide information about virtual reality content, such as whether the content is 2D or 3D, whether the content is pre-spliced or post-spliced, and / or a mapping to the video frames when the content is post-amended. In some deployments, a virtual reality indicator in the media presentation description indicates that all presentation content is formatted for virtual reality.
[00117] In a DASH presentation, the media content for a presentation is divided into periods. A period, as defined by DASH, is a time interval within the media presentation. The presentation, therefore, consists of a contiguous sequence of periods. Within a period, media content typically has a consistent set of encodings, including an average bit rate, a language, a caption setting, a subtitle setting, etc.
[00118] In several deployments, the elements and / or attributes of a period can be used to indicate virtual reality content. For example, an element can be modified or added to the regime for the period, where the element then signals the virtual reality content. In several deployments, attributes can also be modified or added to the period to provide information about virtual reality content, such as whether the content is 2D or 3D, whether the content is pre-spliced or post
Petition 870190092767, of 9/17/2019, p. 68/141
65/109 amended, and / or a mapping to the video frames when the content is post-amended. In some deployments, a virtual reality indicator in the period indicates that the presentation content is formatted for virtual reality.
[00119] Within a period, the content can be organized into adaptation sets. An adaptation set represents a set of replaceable encoded versions of one or more components of media content. For example, a period can include an adaptation set for the main video component and a separate adaptation set for the main audio component. In this example, if other content is available, such as subtitles or audio descriptions, each can have a separate adaptation set.
[00120] In several deployments, the virtual reality content can be signaled in a set of adaptations. For example, an element can be modified or added to the scheme for the set of adaptations, where the element then signals the virtual reality content. In several deployments, attributes can also be modified or added to the set of adaptations to provide information about virtual reality content, such as whether the content is 2D or 3D, whether the content is pre-spliced or post-spliced, and / or a mapping to the video frames when the content is post-spliced. In some deployments, a virtual reality indicator in the set of adaptations indicates that each of the representations in the set of adaptations includes virtual reality content.
Petition 870190092767, of 9/17/2019, p. 69/141
66/109 [00121] A set of adaptations can contain several alternative representations. A representation describes a coded version that can be delivered from one or more components of media content. Any single representation within an adaptation can be used to render the components of the media content in the period. Different representations in a set of adaptations can be considered perceptually equivalent, which means that a client device can dynamically switch from one representation to another representation within the set of adaptations, in order to adapt to network conditions or other factors.
[00122] In several deployments, the virtual reality content can be signaled in a representation. For example, an element can be modified or added to the scheme for representation, where the element then signals virtual reality content. In many deployments, attributes can also be modified or added to the representation to provide information about virtual reality content, such as whether the content is 2D or 3D, whether the content is pre-spliced or post-spliced, and / or a mapping to the video frames when the content is post-amended. In some deployments, a virtual reality indicator in the representation indicates that the presentation content has been formatted for virtual reality.
[00123] Another format related to streaming media content is the Session Description Protocol (SDP), which is described in RFC 4566. The SDP can be used to describe multimedia communication sessions. These
Petition 870190092767, of 9/17/2019, p. 70/141
67/109 descriptions can be used, for example, for session announcement, session invitation and parameter negotiation. The SDP is not used to deliver the media itself, but can be used between terminals to negotiate the type, format and associated properties of the media. A set of properties and parameters is often referred to as a session profile. The SDP was originally a component of the Session Announcement Protocol (SAP), but found other uses in conjunction with the Real Time Transfer Protocol (RTP), Real Time Streaming Protocol (RTSP), Login Protocol ( SIP) and as a stand-alone format for describing multicast sessions.
[00124] In several deployments, the indication of the virtual reality content can be included in a session description and / or in a media description in an SDP message. For example, a field can be added or modified in the session description and / or media description to indicate the presence of virtual reality content in the streaming content. In addition, in some deployments, parameters related to virtual reality content can also be added to an SDP message. These parameters can include, for example, whether the virtual reality content is 2D or 3D, whether the content is pre-spliced or post-spliced, and / or a mapping used to store the data. In this and other examples, the SDP can be used in applications via streaming, broadcast and / or telepresence or RTP conferencing, to indicate that the media content includes virtual reality content.
[00125] As another example, Boadcast and Multicast Multimedia Services (MBMS) can be used to
Petition 870190092767, of 9/17/2019, p. 71/141
68/109 indicate virtual reality content when the content is transmitted over 3GPP cellular networks. MBMS is a point-to-multipoint interface specification that can provide efficient delivery of broadcast and multicast services, both within a cell and within the central network. Target applications for MBMS include mobile television, live audio and video streamlining, file delivery and emergency alert delivery.
[00126] In several deployments, the signaling of the virtual reality content, as well as the parameters related to the content can be done by adding a new resource to the list of resource requirements of the MBMS. In several deployments, the signaling of virtual reality content can be performed in a similar way for other broadcast and multicast applications.
[00127] In various deployments, when a track in an ISO-based media file includes virtual reality content, several additional or alternative approaches can be used to assign virtual reality content to a reader device. Figure 6 illustrates an example of an ISO 600-based media file, where a 624c manipulator box is used to signal that the content of a track includes virtual reality video. The 600 file can include a 610 file type box, which can specify the particular ISOBMFF tag (s) or iterations or derivations of the ISOBMFF with which the 600 file is compatible. The 600 file can also include a 620 film box, which can contain metadata for a presentation. The 600 file can also optionally include one or more 630a fragments,
Petition 870190092767, of 9/17/2019, p. 72/141
69/109
630b, 630c, 630n, as discussed above.
[00128] Film box 620 includes a film header box 622 and one or more boxes of track 624, as well as other boxes not shown here. The film header box 622 may include information that describes the presentation as a whole. Track box 624 can include information for a track in the presentation. Track box 624 may include a track header box 624a and zero or more media data boxes 624b.
[00129] The media data box 624b can include a 642c manipulator box, among other boxes. The manipulator box 642c, which can also be referred to as a manipulator reference box, can indicate the media type of the track. The track's media type defines the process by which media data is presented on the track. Examples of media types include video and audio, among others. The way the media is presented can include a format for the media. For example, a format (for example, aspect ratio, resolution, frame rate, etc.) that a reader device uses to deliver video data on the track can be stored on the video track, and be identified by a version of the handler of the 642c manipulator box. In some cases, the 600 file may include a general handler for metadata streams of any type. In such cases, the specific format of the video content can be identified by a sample entry that describes the content.
[00130] In some cases, the media data box 624b may include a 642c handler box. THE
Petition 870190092767, of 9/17/2019, p. 73/141
70/109 manipulator box 642c can be used to indicate that the track described by track box 624 includes virtual reality data. For example, when the track describes video data, the manipulator box 642c can be specifically a video manipulator box, which can be identified by the type of video box.
[00131] In various deployments, the 642c handler box can be used to indicate that the media content referred to by the media data box 624b includes virtual reality content. For example, the 642c handler box may include an optional indicator (for example, in a reserved bit or new variable) that the video content contained in the track is virtual reality video. Video players that are not configured to read the optional indicator can ignore it.
[00132] In some deployments, the video handler box can optionally also include parameters that describe virtual reality content, such as whether the virtual reality video is 2D or 3D, whether the virtual reality video is pre-spliced or post-amended and / or a mapping for the virtual reality video. In various deployments, parameters related to virtual reality content can be displayed in several other boxes that can be found in track box 524. For example, parameters can be flagged in track header box 624a. Alternatively or in addition, the parameters can be signaled in a media header box (identified by the mdhd box type), and / or in a video media header box (identified by the vmhd box type), which are not
Petition 870190092767, of 9/17/2019, p. 74/141
71/109 illustrated here. As an alternative or in addition, the parameters can be indicated in a sample entry, and / or in a newly defined box that can be placed on the upper level of track box 624.
[00133] Figure 7 illustrates an example of an ISO 700 based media file, in which a new 724d handler box has been defined to indicate that the track includes virtual reality content. The 700 file can include a box of the 710 file type, which can specify the particular ISOBMFF tag (s) or iterations or derivations of the ISOBMFF with which the 700 file is compatible. The 700 file can also include a movie box 720, which can contain the metadata for a presentation. The 700 file can also optionally include one or more fragments 730a, 730b, 730c, 730n, as discussed above.
[00134] The film box 720 includes a film header box 722 and one or more boxes of tracks 724, as well as other boxes not illustrated here. The film header box 722 may include information that describes the presentation as a whole. Track box 724 can include information for a track in the presentation. Track box 724 may include a track header box 724a and zero or more media data boxes 724b.
[00135] As discussed above, in some cases, the media data box 724b may include a handler box 724d, which can describe a model for presenting the media content described by the media data box 724b. In the example in figure 7, it was defined
Petition 870190092767, of 9/17/2019, p. 75/141
72/109 a new 724d manipulator box, which is specific to the virtual reality video data. The new 724d manipulator box can be identified, for example, by the type of vrvd box. In this example, video players that are not compatible with virtual reality content may not be able to identify the new 724d handler box and thus can ignore the new 724d handler box and ignore any content referred to by the track box 724. The virtual reality content, then, will not be rendered and displayed by a reader that is not configured to display virtual reality video.
[00136] In some deployments, the new video handler box can optionally also include parameters that describe the virtual reality content, such as whether the virtual reality video is 2D or 3D, if the virtual reality video is pre-spliced or post-amended and / or a mapping for the virtual reality video. In various deployments, parameters related to virtual reality content can be displayed in several other boxes that can be found in track box 724. For example, parameters can be flagged in track header box 724a. Alternatively or in addition, the parameters can be signaled in a media header box (identified by the mdhd box type), and / or in a video media header box (identified by the vmhd box type), which are not illustrated here. As an alternative or in addition, the parameters can be indicated in a sample entry, and / or in a newly defined box that can be placed in the upper level of track box 724.
Petition 870190092767, of 9/17/2019, p. 76/141
73/109 [00137] Figure 8 illustrates an example of an 840 media box that can be included in an ISO-based media file. As discussed above, a media box can be included in a track box, and it can contain objects and information that describe the media data on the track. In the illustrated example, the 840 media box includes an 842 media information box. The 840 media box can also include other boxes, which are not illustrated here.
[00138] The media information box 842 can contain objects that describe characteristic information about the media on the track. For example, the media information box 842 can include a data information box that describes the location of media information on the track. As another example, the media information box 842 can include a video media header, when the track includes video data. The video media header can contain general presentation information that is independent of the encoding of the video media. The 842 media information box can also include a media header when the track includes audio data.
[00139] The media information box 842 can also include a box from the sample table 844, as provided in the illustrated example. The sample table box 844, identified by the box type stbl, can provide locations (for example, locations with a file) for the media samples on the track, as well as time information for the samples. Using the information provided by the sample table box 844, a reader device
Petition 870190092767, of 9/17/2019, p. 77/141
74/109 can locate samples in the correct order of time, determine the type of a sample and / or determine the size, container and displacement of a sample within a container, among other things.
[00140] The sample table box 844 may include a sample description box 846, identified by the type of box stsd. The same description box 846 can provide detailed information about, for example, the type of encoding used for a sample, and any initialization information needed for that type of encoding. The information stored in the sample description box can be specific to a type of track that includes the samples. For example, one format can be used for the sample description when the track is a video track, and a different format can be used when the track is a suggestion track. As another example, the format for the sample description can also vary depending on the format of the suggestion track.
[00141] The sample description box 846 may include one or more sample input boxes 848a, 848b, 848c. The sample entry type is an abstract class, so the sample description box usually includes a specific sample entry box, such as a visual sample entry for video data or an audio sample entry for sample samples. audio, among other examples. A sample inbox can store the parameters for a given sample. For example, for a video sample, the sample inbox can include a width, height, horizontal resolution, vertical resolution, frame count and / or
Petition 870190092767, of 9/17/2019, p. 78/141
75/109 depth for the video sample, among other things. As another example, for an audio sample, the sample entry can include a channel count, a channel layout and / or a sample rate, among other things.
[00142] In the illustrated example, the first entry of sample 848a includes a restricted regime information box 860. The restricted regime information box, identified by the type of rinf box, can contain the necessary information, both to understand a restricted regime applied to a sample regarding the regime parameters. In some cases, the author of a file may require some actions from a reader device. In such cases, the file may include a restricted regime information box, which a reader device can locate and use to determine the requirements for rendering the file's media contents. Readers who may not be able to render the content may also use the restricted regime information box to determine that they cannot render the content and therefore should not attempt to process the content. The restricted regime information box usually includes an original sample entry type, that is, the type of the sample entry before any transformation described by the restricted regime information box.
[00143] In several deployments, a restricted regime can be defined for the content of virtual reality. In these deployments, an 860 restricted regime information box can be added to a sample 848a entry that includes virtual reality data. The type of restricted regime can be specified in a
Petition 870190092767, of 9/17/2019, p. 79/141
76/109 type of scheme 862, identified by the type of schm box. For example, an encoding corresponding to vrvd can be used to identify a restricted regime for virtual reality content.
[00144] The restricted information box 860 in the illustrated example includes an information box 864, identified by the type of schi box. The 864 regime information box can store information for a specific regime. For example, when the restricted regime is for virtual reality content, the 864 regime information box can include parameters for the virtual reality content. These parameters can include, for example, whether the virtual reality video is 2D or 3D, whether the virtual reality video is pre-spliced or post-spliced, and / or a mapping to the virtual reality video. In several deployments, a scheme information box can be defined for virtual reality content, specifically to contain parameters for virtual reality content.
[00145] Using the technique illustrated in figure 8, no new boxes need to be added to the ISOBMFF specification that may not be understood by legacy reader devices Even with new boxes, a legacy reader device may attempt to play content that devices cannot identify, and when this content is virtual reality media, the result can be a distorted presentation. To avoid adding new boxes, a file can be generated for the virtual reality content, where the file probably includes only boxes that a legacy reader device can
Petition 870190092767, of 9/17/2019, p. 80/141
77/109 identify. The legacy reader device may further determine that the device is not able to implement the restricted regime described by the restricted regime information box 864 and therefore does not attempt to display the virtual reality content.
[00146] The technique also offers flexibility for both legacy readers and readers capable of rendering virtual reality content. A legacy reader can, for example, determine whether the reader understands the virtual reality regime identified by the restricted regime information box. When the reader device is unable to adapt to the restricted regime, the reader device may choose not to render the content on the track, or it may be able to process the original unprocessed samples instead. The restricted regime mechanism, then, can allow reader devices to inspect a file to determine the rendering requirements of a bit stream, and can prevent a legacy reader device from decoding and rendering files that the device may not be able to process. .
[00147] In various implementations, the virtual reality content can alternatively or additionally be included in a supplementary enhancement information (SEI) message in a video bit stream. The SEI message can thus indicate that the bit stream includes virtual reality content. In several deployments, the SEI message can indicate the virtual reality content at the file level, at the movie level and / or at the track level. In several deployments, the SEI message can also include parameters that describe the
Petition 870190092767, of 9/17/2019, p. 81/141
78/109 properties of the virtual reality video (for example, if the video is 2D or 3D, pre-spliced or post-spliced, etc.).
[00148] In several deployments, an extension to the ISOBMFF specification may include an rcvp sample entry type for use with a timed metadata track containing timed metadata from the recommended viewport. The extension can include an rvif box type indicating a recommended viewport information box. In some deployments, the recommended viewport may be a viewport with greater association with virtual reality data, as discussed in this document. The following text provides the text of Section 7.7.5 of the ISO / IEC EDIS 23090-2 standard: 201x (E)
Information technology - Coded representation of immersive media (MPEG-I) - Part 2: Omnidirectional media format from February 7, 2018.
RECOMMENDED DISPLAY WINDOW
The timed metadata track of the recommended viewport indicates the viewport that should be displayed when the user has no control over viewing orientation or has given up control over viewing orientation.
NOTE: The recommended viewport timed metadata track can be used to indicate a recommended viewport based on a director's cut or based on statistical viewport indications.
The 'rcvp' type of track sample input must be used.
The sample entry for this type of sample entry is
Petition 870190092767, of 9/17/2019, p. 82/141
79/109 specified as follows: class RcvpSampleEntry () extends
SphereRegionSampleEntry ('rcvp') {RcypInfoBox (): // mandatory} class RcypInfoBox extends FullBox ('rvif', 0, 0) {unsigned int (8) viewport_type; string viewport_description; } Viewport_type specifies the type of the recommended viewport, as listed in Table 0.1.
TABLE 0.1 - RECOMMENDED DISPLAY WINDOW TYPE
Value description 0 A recommended viewport according to the director's cut, that is, a suggested viewport according to the creative intent of the content author or content provider 1 A recommended viewport selected based on statistical indications of the view 2,239 Reserved (for use by future extensions to ISO / IEC 23090-2) 240,255 Unspecified (for use by applications or external specifications)
viewport_description is a null-terminated UTF-8 string that provides a textual description of the recommended viewport.
The SphereRegionSample sample syntax must be used.
shape_type must be equal to 0 in the SphereRegionConfigBox of the sample entry.
static_azimuth_range and static_elevation_range, when present, or azimuth_range and elevation_range, when present, indicate azimuth and elevation ranges,
Petition 870190092767, of 9/17/2019, p. 83/141
80/109 respectively, of the recommended viewport.
center_azimuth and center_elevation indicate the central point of the recommended viewport in relation to the global coordinate axes. center_tilt indicates the tilt angle of the recommended viewport.
[00149] Figure 9 illustrates an example of a 900 process for generating a file containing virtual reality content, as described here. In 902, process 900 includes obtaining virtual reality data, where virtual reality data represents a 360-degree view of a virtual environment. In some implementations, virtual reality data includes virtual reality video. In some implementations, virtual reality video can be pre-spliced. In some deployments, frames in the virtual reality video cannot be brought together, and may require post-splice. Virtual reality data can be captured and encoded for storage and transmission to a receiving device, as discussed in this document.
[00150] In 904, process 900 includes storing the virtual reality data to a file, in which the virtual reality data is stored according to a file format, in which the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with virtual reality data, where the information associated with virtual reality data is stored in a track box. In many deployments, the file format is ISOBMFF or a file format derived from ISOBMFF. In some deployments, information
Petition 870190092767, of 9/17/2019, p. 84/141
81/109 associated with virtual reality data may include, for example, frame rates, resolutions, position within the file or within other files of video and / or audio samples, and / or other information. In some deployments, virtual reality data can be stored on one or more media tracks in the file.
[00151] In 906, process 900 includes storing a sample entry in the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a metadata track that contains information about a most viewed display window associated with virtual reality data. For example, this can be indicated by the 'mvvp' 4CC, as discussed in this document.
[00152] The most viewed display window can be completely covered by a set of most requested image regions. The most requested image regions can be regions on the spherical surface of the virtual environment content that was most frequently requested by receiving devices or viewed by previous users at a presentation time during previous reproductions of the virtual reality content. For example, virtual reality content can include objects that appear recently during playback, and regions where objects appear may be of interest to the user at the time of appearance. For example, the regions of images most requested may be other regions of interest to the user at specific times of presentation during playback.
[00153] In some deployments, the 900 process
Petition 870190092767, of 9/17/2019, p. 85/141
82/109 can also include the storage of parameters related to the virtual reality video (for example, the optional parameters described in this document) for the file. In some deployments, parameters can be stored in a regime information box.
[00154] Figure 10 illustrates an example of a process 1000 of extracting virtual reality content from a file, as described in this document. In 1002, process 900 includes receiving a file containing virtual reality data, in which the virtual reality data represents a 360-degree view of a virtual environment, in which the virtual reality data is stored in the file according to a format file format, in which the file format specifies the position within the file of virtual reality content and the position within the file of information associated with virtual reality data, where the information associated with virtual reality data is stored within a track box.
[00155] In 1004, process 900 includes extracting virtual reality data from a file, in which the virtual reality data is stored in the file according to a file format, in which the file format specifies the position within the file of virtual reality data and specifies the position within the information file associated with virtual reality data, where the information associated with virtual reality data is stored within a track box. In many deployments, the file format is based on an ISOBMFF format.
Petition 870190092767, of 9/17/2019, p. 86/141
83/109 [00156] In 1006, the process includes extracting a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a track of timed metadata that contains information in a more visualized viewport associated with virtual reality data. The virtual reality data can then be decoded and rendered on a video display device, as discussed below.
[00157] Figure 11 illustrates an example of a 1100 process for decoding and rendering a virtual reality environment, as described here. For example, process 1100 can run on a video display device. In 1102, the 1100 process includes receiving virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment. Virtual reality data can include video data and audio data. The virtual reality data may have been generated and extracted by the process illustrated in figure 9 and figure 10.
[00158] In 1104, process 1100 includes the decoding of virtual reality data. Decryption can proceed as discussed in this document, in accordance with the file format. In many deployments, the file format can be based on an ISO-based media file format.
[00159] In 1106, the 1100 process includes rendering the virtual environment represented by the virtual reality data for display to a user. Rendering can use information from the
Petition 870190092767, of 9/17/2019, p. 87/141
84/109 most viewed display discussed in this document. In some deployments, the most viewed viewport may be a spherical viewport specified by four large circles. In other deployments, the most viewed viewport can be a spherical rectangular viewport specified by two yaw circles and two tilt circles.
[00160] In some examples, processes 900, 1000 and 1100 can be performed by a computing device or device, such as system 100. For example, process 900, 1000 and / or 1100 can be performed by system 100 and / or storage 108 or output 110 shown in figure 1. In some cases, the computing device or device may include a processor, microprocessor, microcomputer or other component of a device that is configured to perform the 900 process steps , 1000, or 1100. In some examples, the computing device or device may include a camera configured to capture video data (for example, a video stream), including video frames. For example, the computing device can include a camera device (for example, an IP camera, or other type of camera device) that can include a video codec. In some examples, a camera or other capture device that captures video data is separate from the computing device, in which case the computing device receives the captured video data. The computing device may also include a network interface configured to communicate video data. The network interface can be configured to
Petition 870190092767, of 9/17/2019, p. 88/141
85/109 communicate data based on the Internet Protocol (IP).
[00161] The 900, 1000 and 1100 processes are illustrated as logical flow diagrams, whose operation represents a sequence of operations that can be implemented in hardware, computer instructions or a combination of these. In the context of computer instructions, operations represent computer executable instructions stored in one or more computer-readable storage media that, when executed by one or more processors, perform the listed operations. Generally, computer executable instructions include routines, programs, objects, components, data structures and the like that perform specific functions or implement particular types of data. The order in which the operations are described should not be interpreted as a limitation, and any number of the operations described can be combined in any order and / or in parallel to implement the processes.
[00162] In addition, 1100 processes can be performed under the control of one or more computer systems configured with executable instructions and can be deployed as code (for example, executable instructions, one or more computer programs or one or more applications ) of collective execution on one or more processors, by hardware, or their combinations. As noted above, the code can be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. The means of
Petition 870190092767, of 9/17/2019, p. 89/141
86/109 computer read or machine read storage can be non-transitory.
[00163] Specific details of the encoding device 1204 and the decoding device 1312 are shown in figures 12 and 13, respectively. Figure 12 is a block diagram illustrating an exemplary coding device 1204 that can implement one or more of the techniques described in this invention. The encoding device 1204 can, for example, generate the syntax structures described in this document (for example, the syntax structures of a VPS, SPS, PPS, or other syntax elements). The encoding device 1204 can perform intrapredicting and interpreting video blocks within video slices. The intracoding described above depends, at least in part, on spatial prediction to reduce or eliminate spatial redundancy within a given frame or video photo. Intercoding depends, at least in part, on temporal prediction to reduce or eliminate temporal redundancy within adjacent frames or around a video sequence. Intra mode (mode I) can refer to any of the various space-based compression modes. Inter modes, such as unidirectional prediction (P mode) or biprediction (B mode), can refer to any of several time-based compression modes.
[00164] The coding device 1204 includes a segmentation unit 35, prediction processing unit 41, filter unit 63, image memory 64, adder 50, transform processing unit 52,
Petition 870190092767, of 9/17/2019, p. 90/141
87/109 quantization unit 54 and entropy coding unit 56. The prediction processing unit 41 includes motion estimation unit 42, motion compensation unit 44 and intraprediction processing unit 46. For reconstruction of video block, the encoding device 1204 also includes the reverse quantization unit 58, the reverse transform processing unit 60 and the adder 62. The filter unit 63 is intended to represent one or more filters in a loop, as a filter release filter, an adaptive loop filter (ALF) and an adaptive sample displacement filter (SAO). Although filter unit 63 is shown in Figure 12 as a loop filter, in other configurations, filter unit 63 can be implemented as a post-loop filter. A post-processing device 57 can perform additional processing for encoded video data generated by the encoding device 1204. The techniques of this invention can, in some cases, be implemented by the encoding device 1204. In other cases, however, one or more more of the techniques of this invention can be implemented by the post-processing device 57.
[00165] As shown in figure 12, the encoding device 1204 receives video data, and the segmentation unit 35 segments the data into video blocks. This segmentation can also include segmentation into slices, slice segments, tiles or other larger units, as well as segmentation of video blocks, for example, according to a quadtree structure of LCUs and CUs. The coding device 1204 generally illustrates
Petition 870190092767, of 9/17/2019, p. 91/141
88/109 the components that encode video blocks within a video slice to be encoded. The slice can be divided into several video blocks (and possibly sets of video blocks referred to as tiles). The prediction processing unit 41 can select one of a plurality of possible encoding modes, such as one of a plurality of intraprediction encoding modes or one of a plurality of interpreting encoding modes, for the current video block based on error results (for example, the encoding rate and the level of distortion, or the like). The prediction processing unit 41 can supply the resulting intra or intercodified block to adder 50 to generate residual block data and to adder 62 to reconstruct the encoded block for use as a reference photo.
[00166] The intraprediction processing unit 46 within the prediction processing unit 41 can perform the intraprediction encoding of the current video block in relation to one or more neighboring blocks of the same frame or slice as the current block to be encoded to provide spatial compression. The motion estimation unit 42 and the motion compensation unit 44 perform the intrapredictive coding of the current video block in relation to one or more predictive blocks in one or more reference images to provide temporal compression.
[00167] The unit of estimate of movement 42 can be configured to to determine the way in interpretation for a slice video from a deal with one
Petition 870190092767, of 9/17/2019, p. 92/141
89/109 predetermined standard for a video sequence. The predetermined pattern can designate video slices in the sequence as P slices, B slices or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a prediction unit (PU) of a video block within a video frame or photo in progress in relation to a predictive block within a reference image.
[00168] A predictive block is a block considered to be closely corresponding to the PU of the video block to be encoded in terms of pixel difference, which can be determined by the sum of the absolute difference (SAD), the sum of the quadratic difference (SSD) or other difference indicators. In some examples, the encoding device 1204 can calculate values for sub-entire pixel positions of reference images stored in image memory 64. For example, the encoding device 1204 can interpolate fourth pixel position values, eighth positions pixel or other fractional pixel positions of the reference image. Therefore, the motion estimation unit 42 can perform a motion search in relation to the entire pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
Petition 870190092767, of 9/17/2019, p. 93/141
90/109 [00169] Motion estimation unit 42 calculates a motion vector for a PU of a video block in an intercodified slice by comparing the position of the PU to the position of a predictive block of a reference image. The reference image can be selected from a first reference image list (List 0) or a second reference image list (List 1), each of which identifies one or more reference images stored in the image memory 64 The motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and the motion compensation unit 44.
[00170] Motion compensation, performed by motion compensation unit 44, may involve the search or generation of the predictive block based on the motion vector determined by motion estimate, possibly the performance of interpellations for subpixel precision. After receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in one of the reference image lists. The encoding device 1204 forms a residual video block by subtracting the pixel values from the predictive block from the pixel values of the current video block being encoded, forming the pixel difference values. The pixel difference values form residual data for the block, and can include both luminance and chrominance difference components. The adder 50 represents the component or components that perform this subtraction operation. The unit of
Petition 870190092767, of 9/17/2019, p. 94/141
91/109 motion compensation 44 can also generate elements of syntax associated with the video blocks and the video slice for use by the decoding device 1312 in decoding the video blocks of the video slice.
[00171] The intraprediction processing unit 46 can intrapredict a block in progress, as an alternative to the interpretation performed by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, the intraprediction processing unit 46 can determine an intraprediction mode to be used to encode an ongoing block. In some examples, the intraprediction processing unit 46 can encode an ongoing block using various intraprediction modes, for example, during separate encoding passes, and the intraprediction processing unit 46 (or the mode selection unit 40, in some examples) you can select an appropriate intraprediction mode to use the tested modes. For example, the intraprediction processing unit 46 can calculate distortion rate values using an analysis of the distortion rate for the different intraprediction modes tested, and can select the intraprediction mode with the best characteristics of the distortion rate between the media. tested. The distortion rate analysis generally determines an amount of distortion (or error) between a coded block and an original, uncoded block that has been coded to produce the coded block, as well as a bit rate (that is, a number of bits) used to produce the coded block. The intraprediction processing unit 46 can calculate the
Petition 870190092767, of 9/17/2019, p. 95/141
92/109 distortion ratios and ratios for the various coded blocks to determine which intraprediction mode has the best distortion rate value for the block.
[00172] In any case, after selecting an intraprediction mode for a block, the intraprediction processing unit 46 can supply information indicative of the intraprediction mode selected for the block to the entropy coding unit 56. The entropy coding unit 56 can encode the information indicating the selected intraprediction mode. The coding device 1204 may include in the transmitted bit stream settings data settings for coding contexts for several blocks, as well as indications of a more likely intraprediction mode, an intraprediction mode index table and a modified index table the intrapredition mode to be used for each of the contexts. The bitstream configuration data may include a plurality of intraprediction mode index tables and a plurality of intraprediction mode index tables (also referred to as codeword mapping tables).
[00173] After the processing unit 41 generates the predictive block for the current video block, or through intraprediction or interpretation, the encoding device 1204 forms a residual video block by subtracting the predictive block from the block video in progress. Residual video data in the residual block can be included in one or more TUs and applied to transform the transform processing unit 52. The transform processing unit
Petition 870190092767, of 9/17/2019, p. 96/141
93/109 transforms residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. The transform processing unit 52 can convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
[00174] The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting a quantization parameter. In some examples, the quantization unit 54 can then perform a digitization of the matrix, including the quantized transform coefficients. Alternatively, the entropy coding unit 56 can perform the digitization.
[00175] After quantization, the entropy coding unit 56 entropy codes the quantized transform coefficients. For example, the entropy coding unit 56 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary coding (SBAC), entropy coding probability interval segmentation (PIPE) or other entropy coding technique. After
Petition 870190092767, of 9/17/2019, p. 97/141
94/109 entropy coding by the entropy coding unit 56, the encoded bit stream can be transmitted to the decoding device 1312, or archived for later transmission or retrieval by the decoding device 1312. The entropy coding unit 56 can also entropy the motion vectors and other syntax elements for the current video slice being encoded.
[00176] The inverse quantization unit 58 and the inverse transform processing unit 60 apply reverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference image. The motion compensation unit 44 can calculate a reference block by adding the residual block to a predictive block of one of the reference images within a list of reference images. The motion compensation unit 44 can also apply one or more interpellation filters to the reconstructed residual block to calculate the sub-integer pixel values for use in the motion estimate. Adder 62 adds the reconstructed residual block to the compensated motion prediction block produced by the motion compensation unit 44 to produce a reference block for storage in the image memory 64. The reference block can be used by the motion estimate unit 42 and the motion compensation unit 44 as a reference block for intrapredicting a block in a subsequent video frame or image.
Petition 870190092767, of 9/17/2019, p. 98/141
95/109 [00177] Thus, the encoding device 1204 of figure 12 represents an example of a video encoder configured to generate the syntax for an encoded video bit stream. The encoding device 1204 can, for example, generate sets of VPS, SPS, and PPS parameters, as described above. The coding device 1204 can perform any of the techniques described in this document, including the processes described above, with respect to figures 12 and 13. The techniques of this invention have been described, in general, in relation to the 1204 coding device, but, as mentioned above, some of the techniques of the invention can also be implemented by the post-processing device 57.
[00178] Figure 13 is a block diagram illustrating an example of the decoding device 1312. The decoding device 1312 includes an entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, reverse transform processing 88, adder 90, filter unit 91 and image memory 92. The prediction processing unit 81 includes the motion compensation unit 82 and the intraprediction processing unit 84. The decoding device 1312 can , in some examples, performing a decoding pass in general reciprocal to the encoding pass described with respect to the video encoder 1204 of figure 12.
[00179] During the decoding process, the decoding device 1312 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements
Petition 870190092767, of 9/17/2019, p. 99/141
96/109 sent by the encoding device 1204. In some embodiments, the decoding device 1312 can receive the encoded video bit stream from the encoding device 1204. In some embodiments, the decoding device 1312 can receive the video bit stream encoded from a network entity 79, such as a server, a network element that recognizes the media (MANE), a video editor / segmentator or other similar device configured to implement one or more of the techniques described above. Network entity 79 may or may not include encoding device 1204. Some of the techniques described in this invention may be implemented by network entity 79 before network entity 79 transmits the encoded video bit stream to the decoding device 1312. In some video decoding systems, network entity 79 and decoding device 1312 may be part of separate devices, while in other cases, the functionality described with respect to network entity 79 may be performed by the same device comprising the decoding device 1312.
[00180] The entropy decoding unit 80 of decoding device 1312 entropy decodes the bit stream to generate quantized coefficients, motion vectors and other syntax elements. The entropy decoding unit 80 forwards the motion vectors and other syntax elements to the prediction processing unit 81. The decoding device 1312 can receive the syntax elements at the video slice level and / or at the video level.
Petition 870190092767, of 9/17/2019, p. 100/141
97/109 video block. The entropy decoding unit 80 can process and analyze both fixed-length syntax elements and variable-length syntax elements in one or more sets of parameters, such as a VPS, SPS and PPS.
[00181] When the video slice is encoded as an intra-coded slice (I), the intraprediction processing unit 84 of the prediction processing unit 81 can generate prediction data for a video block of the current video slice based on in a signaled intraprediction mode and block data previously decoded from the current frame or photo. When the video frame is encoded as an intercoded slice (for example, B, P or GPB), the motion compensation unit 82 of the prediction processing unit 81 produces predictive blocks for a video block of the current video slice based on motion vectors and other syntax elements received from entropy decoding unit 80. Predictive blocks can be produced from one of the reference images within a list of reference images. The decoding device 1312 can build the lists of reference frames, List 0 and List 1, using standard construction techniques based on the reference images stored in the image memory 92.
[00182] Motion compensation unit 82 determines prediction information for a video block from the current video slice by analyzing motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the block in
Petition 870190092767, of 9/17/2019, p. 101/141
98/109 video in progress being decoded. For example, motion compensation unit 82 can use one or more elements of syntax in a set of parameters to determine a prediction mode (for example, intra- or interpredition) used to encode the video blocks of the video slice, a type of interpretation slice (for example, slice B, slice P, or GPB slice), construction information for one or more of the reference image lists for the slice, motion vectors for each slice's intercodified video block, state interpretation for each intercodified video block in the slice and other information to decode the video blocks in the current video slice.
[00183] The movement compensation unit 82 can also perform interpellation based on interpellation filters. The motion compensation unit 82 can use interpellation filters as used by the encoding device 1204 when encoding the video blocks to calculate interpellated values for subintelligent pixels of reference blocks. In this case, the motion compensation unit 82 can determine the interpellation filters used by the encoding device 1204 of the received syntax elements, and can use the interpellation filters to produce predictive blocks.
[00184] The inverse quantization unit 86 quantizes in reverse, or dequantizes, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 80. The reverse quantization process may include the use of a quantization parameter calculated by the
Petition 870190092767, of 9/17/2019, p. 102/141
99/109 encoding 1204 for each video block in the video slice to determine a degree of quantization and, likewise, a degree of inverse quantization that must be applied. The reverse transform processing unit 88 applies a reverse transform (for example, a reverse DCT or other suitable reverse transform), a reverse integer transform or a conceptually similar reverse transform process to the transform coefficients to produce residual blocks in the domain of the pixel.
[00185] After the motion compensation unit 82 generates the predictive block for the current video block based on the motion vectors and other syntax elements, the decoding device 1312 forms a video block decoded by the sum of the blocks residuals of the reverse transform processing unit 88 with the corresponding predictive blocks generated by the motion compensation unit 82. The adder 90 represents the component or components that perform this sum operation. If desired, other loop filters (either in the encoding loop or after the encoding loop) can also be used to smooth out pixel transitions, or otherwise improve video quality. Filter unit 91 is intended to represent one or more loop filters, such as an unblocking filter, an adaptive loop filter (ALF) and an adaptive sample displacement filter (SAO). Although filter unit 91 is shown in figure 13 as a loop filter, in other configurations, filter unit 91 can be implemented as a post-loop filter. The video blocks decoded in a given frame or image are stored
Petition 870190092767, of 9/17/2019, p. 103/141
100/109 in image memory 92, which stores reference images used for subsequent motion compensation. The image memory 92 also stores decoded video for later presentation on a display device, such as the target device 122 shown in Figure 1.
[00186] In the previous description, aspects of the application are described with reference to their embodiments, but those skilled in the art will recognize that the invention is not limited to those embodiments. Thus, although illustrative embodiments of the application have been described in detail in this document, it should be understood that inventive concepts can be incorporated and employed in various ways, and that the appended claims must be interpreted to include these variations, unless limited by the prior art. Various features and aspects of the invention described above can be used individually or together. In addition, the embodiments can be used in any number of environments and applications other than those described here, without departing from the broader spirit and scope of the specification. The specification and the drawings, therefore, should be considered as illustrative and not restrictive. For purposes of illustration, the methods have been described in a specific order. It should be appreciated that, in alternate embodiments, the methods may be performed in a different order than described.
[00187] Where components are described as being configured to perform certain
Petition 870190092767, of 9/17/2019, p. 104/141
101/109 operations, this configuration can be performed, for example, by designing electronic circuits or other hardware to perform the operation, programming programmable electronic circuits (for example, microprocessors or other suitable electronic circuits) to perform the operation, or any combination of these.
[00188] The various illustrative logic blocks, modules, circuits and algorithm steps described in connection with the embodiments described here can be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this alternation between hardware and software, several components, blocks, modules, circuit and illustrative steps have been described above, in general terms in terms of their functionality. Whether this functionality is implemented as hardware or software, depends on the particular application and design restrictions imposed on the system in general. Those skilled in the art can implement the functionality described in a variety of ways for each particular application, but such implementation decisions should not be construed as departing from the scope of the present invention.
[00189] The techniques described here can be implemented in hardware, software, firmware or any combination of them. These techniques can be implemented on any of a variety of devices, such as general purpose computers, wireless communication device devices or multipurpose integrated circuit devices, including application on wireless communication device devices and other devices.
Petition 870190092767, of 9/17/2019, p. 105/141
102/109
Any features described as modules or components can be implemented together in an integrated logic device or separately as discrete yet interoperable logic devices. If implemented in software, the techniques can be performed, at least in part, through a computer readable data storage medium comprising program code including instructions that, when executed, perform one or more of the methods described above. The computer readable data storage medium may be part of a computer program product, which may include packaging materials. The computer-readable medium can comprise memory or data storage medium, such as random access memory (RAM) as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory ( NVRAM), programmable and electrically erasable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques, additionally or alternatively, can be carried out, at least in part, by a computer-readable communication medium that loads or communicates program code in the form of instructions or data structures and which can be accessed, read and / or executed by a computer, such as signals or propagated waves.
[00190] the program code may be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, circuits
Petition 870190092767, of 9/17/2019, p. 106/141
103/109 application-specific integrated (ASICs), field programmable logic arrays (FPGAs), or other discrete or integrated equivalent logic circuits. This processor can be configured to perform any of the techniques described in this publication. A general purpose processor can be a microprocessor, but, alternatively, the processor can be any commercially available processor, controller, microcontroller or conventional state machine. A processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other configuration of this type. Therefore, the term processor, as used in this document, can refer to any structure previously mentioned, any combination of the structures mentioned above, or any other structure or apparatus suitable for implementing the techniques described in this document. In addition, in some respects, the features described here can be provided in software modules or dedicated hardware modules configured for encoding and decoding, or incorporated into a combined video encoder-decoder (CODEC).
[00191] The encoding techniques discussed here can be incorporated into an example of the video encoding and decoding system. A system includes a source device that provides encoded video data to be decoded later by a
Petition 870190092767, of 9/17/2019, p. 107/141
104/109 target device. In particular, the source device provides video data to a destination device via a computer-readable medium. The source device and the target device 14 can comprise any of a wide variety of devices, including desktop computers, notebook computers (i.e. laptops), tablets, set-top boxes, telephone devices such as so-called smart phones, so-called devices smart phones, televisions, cameras, display devices, digital media players, video game consoles, streaming video devices or the like. In some cases, the source device and the destination device may be equipped for wireless communication.
[00192] The destination device can receive the encoded video data to be decoded through the computer reading medium. The computer reading medium can comprise any type of medium or device capable of moving the encoded video data from a source device to a destination device. In one example, the computer reading medium may comprise a communication medium for enabling a source device to transmit encoded video data directly to the destination device in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to a destination device. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RE) spectrum or one or more physical lines of
Petition 870190092767, of 9/17/2019, p. 108/141
105/109 transmission. The communication medium can be part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from the source device to the destination device.
[00193] In some examples, encrypted data can be output from the output interface to a storage device. Likewise, encrypted data can be accessed from the storage device via the input interface. The storage device may include any of a variety of distributed or locally accessed data storage media, such as a hard disk drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other digital storage media for storing encoded video data. In another example, the storage device may correspond to a file server or other intermediate storage device that can hold the encoded video generated by the source device. The target device can access stored video data from the storage device via streaming or downloading. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device. Exemplary file servers include a network server (for example, for a website), an FTP server,
Petition 870190092767, of 9/17/2019, p. 109/141
106/109 storage connected to the network (NAS) or a local disk drive. The target device can access encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, cable modem, etc.), or a combination of both that is suitable for accessing stored encoded video data on a file server. The transmission of encoded video data from the storage device can be a transmission via streaming, a transmission via download or a combination of both.
[00194] The techniques of this invention are not necessarily limited to wireless applications or configurations. The techniques can be applied to encode video in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television broadcasts, satellite television broadcasts, video streams via Internet streaming, such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded to a data storage medium, decoding digital video stored on a data storage medium or other applications. In some examples, a system can be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video broadcasting and / or video telephony.
[00195] In one example, the source device
Petition 870190092767, of 9/17/2019, p. 110/141
107/109 includes a video source, a video encoder and an output interface. The target device may include an input interface, a video decoder and a display device 32. The source device's video encoder can be configured to apply the techniques disclosed herein. In other examples, a source device and a target device may include other components or arrangements. For example, the source device can receive video data from an external video source, such as an external camera. Likewise, the target device can interact with an external display video, instead of including an integrated video device.
[00196] The example system above is just an example. Techniques for processing video data in parallel can be performed by any digital video encoding and / or decoding device. Although, in general, the techniques of this invention are performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, commonly referred to as a CODEC. In addition, the techniques of this invention can also be performed by a video preprocessor. Source devices and target devices are just examples of such encoding devices where a source device generates encoded video data for transmission to the target device. In some instances, the source and destination devices may operate in a substantially symmetrical manner, such that each of the devices includes coding components and
Petition 870190092767, of 9/17/2019, p. 111/141
108/109 video decoding. Thus, exemplary systems can support unidirectional or bidirectional video transmission between video devices, for example, for video streaming, video playback, video broadcasting or video telephony.
[00197] The video source may include a video capture device, such as a video camera, a video file containing the previously captured video and / or a video feed interface for receiving video from a video content provider . Alternatively, the video source can generate computer-based data, such as the source video, or a combination of live video, archived video and computer generated video. In some cases, if a video source is a video camera, a source device and a destination device can form so-called camera phones or videophones. However, as mentioned above, the techniques described in this publication may be applicable to video encoding in general, and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder. The encoded video information can then be output via the output interface on the computer reading medium.
[00198] As noted, the computer reading medium may include a transient medium, such as transmission via wireless broadcast or wired network, or storage media (i.e., non-transitory storage media), such as a hard disk, flash drive, disk
Petition 870190092767, of 9/17/2019, p. 112/141
109/109 compact, digital video disc, Blu-ray disc or other means of reading by computer. In some examples, a network server (not shown) can receive encoded video data from the source device and provide the encoded video data to the destination device, for example, via network transmission. Likewise, a computing device of a media production unit, such as a disk printing unit, can receive encoded video data from the source device and produce a disc containing the encoded video data. Therefore, the computer reading medium can be understood to include one or more computer reading means in various ways, in various examples.
[00199] The target device's input interface receives information from the reading medium by computer. Information from the computer reading medium may include syntax information defined by the video encoder, which is also used by the video decoder, which includes elements of syntax that describe the characteristics and / or processing of blocks and other encoded units, for example. example, image group (GOP). A display device displays the decoded video data for a user, and can comprise any of a variety of display devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display , an organic light-emitting diode (OLED) screen, or other type of display device. Various embodiments of the invention have been described.

权利要求:
Claims (30)
[1]
1. Method of decoding and displaying virtual reality data, characterized by the fact that it comprises:
receive a file containing virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment;
extracting virtual reality data from a file, where virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with the virtual reality data, in which the information associated with the virtual reality data is stored inside a track box.
extract a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more display window visualized associated with virtual reality data; and decode and render the virtual reality data for display to a user.
[2]
2. Method, according to claim 1, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a region display window spherical specified by four large circles.
Petition 870190092767, of 9/17/2019, p. 114/141
2/10
[3]
3. Method, according to claim 1, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a spherical rectangular display window specified by two yaw circles and two slope circles.
[4]
4. Method, according to claim 1, characterized by the fact that the most viewed display window can be associated with a time of presentation of virtual reality data to the user.
[5]
5. Method, according to claim 4, characterized by the fact that the most viewed display window associated with virtual reality data is selected from the group consisting of:
a display window fully covered by a set of most requested image regions based on statistical indications of viewing virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a standard display window without user control over a virtual reality data display orientation, a display window defined by the director of virtual reality data, and a display window defined by the producer of virtual reality data.
[6]
6. Method, according to claim 1, characterized by the fact that extracting data from
Petition 870190092767, of 9/17/2019, p. 115/141
3/10 virtual reality from archives comprises extracting virtual reality data from one or more media tracks in the archive.
[7]
7. Method, according to claim 1, characterized by the fact that the virtual reality data is rendered and displayed using the information in the most viewed display window associated with the virtual reality data.
[8]
8. Method, according to claim 1, characterized by the fact that the file format is based on a media file format of the International Organization for Standardization (ISO).
[9]
9. Device for decoding and displaying virtual reality data, characterized by the fact that it comprises:
a receiver configured to receive a file containing virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment.
a processor configured to extract virtual reality data from the file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with the virtual reality data, where the information associated with the virtual reality data is stored inside a track box, extracting a sample entry from the box
Petition 870190092767, of 9/17/2019, p. 116/141
4/10 track, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a more viewed display window associated with reality data virtual; and decode and render the virtual reality data for display to a user.
[10]
10. Apparatus, according to claim 9, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a region display window spherical specified by four large circles.
[11]
11. Method, according to claim 9, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a spherical rectangular display window specified by two yaw circles and two slope circles.
[12]
12. Apparatus, according to claim 9, characterized by the fact that the most viewed display window is associated with a time of presentation of virtual reality data to the user.
[13]
13. Device, according to claim 12, characterized by the fact that the most viewed display window associated with virtual reality data is selected from the group consisting of:
a viewport fully covered by a set of most requested image regions based on
Petition 870190092767, of 9/17/2019, p. 117/141
5/10 statistical indications for viewing virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a standard display window without user control over a virtual reality data display orientation, a viewport defined by the director of virtual reality data, and a viewport defined by the producer of virtual reality data.
[14]
14. Apparatus, according to claim 9, characterized by the fact that extracting the virtual reality data from the files comprises extracting the virtual reality data from one or more media tracks of the file.
[15]
15. Device, according to claim 9, characterized by the fact that the virtual reality data is rendered and displayed using the information in the most viewed display window associated with the virtual reality data.
[16]
16. Apparatus according to claim 9, characterized by the fact that the file format is based on a media file format of the International Organization for Standardization (ISO).
[17]
17. Virtual reality data storage method, characterized by the fact that it includes:
obtaining virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment;
Petition 870190092767, of 9/17/2019, p. 118/141
6/10 store the virtual reality data in a file, where the virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the virtual reality data and specifies the position within the information file associated with virtual reality data, where the information associated with virtual reality data is stored within a track box; and store a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a display window most visualized associated with virtual reality data.
[18]
18. Method, according to claim 17, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a region display window spherical specified by four large circles.
[19]
19. Method, according to claim 17, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a spherical rectangular display window specified by two yaw circles and two slope circles.
[20]
20. Method, according to claim 17, characterized by the fact that the display window more
Petition 870190092767, of 9/17/2019, p. 119/141
7/10 visualization is associated with a time of presentation of virtual reality data to a user.
[21]
21. Method, according to claim 20, characterized by the fact that the most viewed display window associated with virtual reality data is selected from the group consisting of:
a display window fully covered by a set of most requested image regions based on statistical indications of viewing virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a standard display window without user control over a virtual reality data display orientation, a display window defined by the director of virtual reality data, and a display window defined by the producer of virtual reality data.
[22]
22. Method, according to claim 17, characterized by the fact that extracting the virtual reality data from the files comprises extracting the virtual reality data from one or more media tracks of the file.
[23]
23. Method, according to claim 17, characterized by the fact that the file format is based on a media file format of the International Organization for Standardization (ISO).
[24]
24. Device for storing virtual reality data, characterized by the fact that
Petition 870190092767, of 9/17/2019, p. 120/141
8/10 comprises:
a receiver configured to obtain virtual reality data, in which the virtual reality data represents a 360 degree view of a virtual environment; and a processor configured to store virtual reality data in a file, where virtual reality data is stored in the file according to a file format, where the file format specifies the position within the file of the data virtual reality and specifies the position within the information file associated with the virtual reality data, in which the information associated with the virtual reality data is stored inside a track box; and store a sample entry from the track box, where the sample entry is associated with one or more samples, where the sample entry indicates that the track is a timed metadata track that contains information in a display window most visualized associated with virtual reality data.
[25]
25. Apparatus according to claim 24, characterized by the fact that the information in the most viewed display window associated with virtual reality data comprises identification data of a type of shape and identification data of a region display window spherical specified by four large circles.
[26]
26. Method, according to claim 24, characterized by the fact that the information in the most viewed display window associated with reality data
Petition 870190092767, of 9/17/2019, p. 121/141
Virtual 9/10 comprise identification data of a shape type and identification data of a spherical rectangular display window specified by two yaw circles and two slope circles.
[27]
27. Apparatus, according to claim 24, characterized by the fact that the most viewed display window is associated with a time of presentation of virtual reality data to a user.
[28]
28. Apparatus, according to claim 27, characterized by the fact that the most viewed display window associated with virtual reality data is selected from the group consisting of:
a display window fully covered by a set of most requested image regions based on statistical indications of viewing virtual reality data at presentation time, a recommended display window for displaying virtual reality data, a standard display window without user control over a virtual reality data display orientation, a display window defined by the director of virtual reality data, and a display window defined by the producer of virtual reality data.
[29]
29. Apparatus, according to claim 24, characterized by the fact that extracting virtual reality data from files comprises extracting virtual reality data from one or more media tracks in the file.
Petition 870190092767, of 9/17/2019, p. 122/141
10/10
[30]
30. Apparatus according to claim 24, characterized by the fact that the file format is based on a media file format of the International Organization for Standardization (ISO).

类似技术:

公开号 | 公开日 | 专利标题

BR112019019287A2|2020-04-14|advanced signaling of regions of interest in omnidirectional visual media

US10979691B2|2021-04-13|Circular fisheye video in virtual reality

CN108605168B|2020-11-03|Method, device and non-transitory computer readable medium for processing virtual reality data

US10917564B2|2021-02-09|Systems and methods of generating and processing files for partial decoding and most interested regions

TWI712313B|2020-12-01|Systems and methods of signaling of regions of interest

ES2895165T3|2022-02-17|Provide stream data sets for streaming video data

KR102037009B1|2019-10-25|A method, device, and computer program for obtaining media data and metadata from an encapsulated bit-stream in which an operation point descriptor can be set dynamically

KR102185811B1|2020-12-03|Enhanced signaling of regions of interest of container files and video bitstreams

US11062738B2|2021-07-13|Signalling of video content including sub-picture bitstreams for video coding

BR112019013871A2|2020-03-03|PERFECTED RESTRICTED SCHEME DESIGN FOR VIDEO

BR112019019250A2|2020-04-14|signaling essential and non-essential supplementary video information

US20210314626A1|2021-10-07|Apparatus, a method and a computer program for video coding and decoding

同族专利:

公开号 | 公开日

SG11201907476XA|2019-10-30|

EP3602261A1|2020-02-05|

KR20190131062A|2019-11-25|

US20180276890A1|2018-09-27|

CN110431522A|2019-11-08|

TW201840201A|2018-11-01|

AU2018237595A1|2019-08-29|

WO2018175903A1|2018-09-27|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US10306308B2|2015-12-15|2019-05-28|Telefonaktiebolaget Lm Ericsson |System and method for media delivery using common mezzanine distribution format|

GB2550587B|2016-05-23|2020-05-20|Canon Kk|Method, device, and computer program for adaptive streaming of virtual reality media content|

WO2017203098A1|2016-05-24|2017-11-30|Nokia Technologies Oy|Method and an apparatus and a computer program for encoding media content|

KR102358757B1|2016-08-25|2022-02-07|엘지전자 주식회사|method for transmitting omnidirectional video, method for receiving omnidirectional video, omnidirectional video transmission device, omnidirectional video receiving device|

EP3823275A1|2016-11-17|2021-05-19|INTEL Corporation|Indication of suggested regions of interest in the metadata of an omnidirectional video|

US20180176468A1|2016-12-19|2018-06-21|Qualcomm Incorporated|Preferred rendering of signalled regions-of-interest or viewports in virtual reality video|US10999602B2|2016-12-23|2021-05-04|Apple Inc.|Sphere projected motion estimation/compensation and mode decision|

WO2018131813A1|2017-01-10|2018-07-19|Samsung Electronics Co., Ltd.|Method and apparatus for generating metadata for 3d images|

US11259046B2|2017-02-15|2022-02-22|Apple Inc.|Processing of equirectangular object data to compensate for distortion by spherical projections|

US10924747B2|2017-02-27|2021-02-16|Apple Inc.|Video coding techniques for multi-view video|

US20190373245A1|2017-03-29|2019-12-05|Lg Electronics Inc.|360 video transmission method, 360 video reception method, 360 video transmission device, and 360 video reception device|

US11093752B2|2017-06-02|2021-08-17|Apple Inc.|Object tracking in multi-view video|

US10754242B2|2017-06-30|2020-08-25|Apple Inc.|Adaptive resolution and projection format in multi-direction video|

US20190005709A1|2017-06-30|2019-01-03|Apple Inc.|Techniques for Correction of Visual Artifacts in Multi-View Images|

US10679415B2|2017-07-05|2020-06-09|Qualcomm Incorporated|Enhanced signaling of regions of interest in container files and video bitstreams|

US20200213570A1|2019-01-02|2020-07-02|Mediatek Inc.|Method for processing projection-based frame that includes at least one projection face and at least one padding region packed in 360-degree virtual reality projection layout|

WO2021242036A1|2020-05-28|2021-12-02|엘지전자 주식회사|Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method|

法律状态:
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762475714P| true| 2017-03-23|2017-03-23|

US15/927,799|US20180276890A1|2017-03-23|2018-03-21|Advanced signalling of regions of interest in omnidirectional visual media|

PCT/US2018/024057|WO2018175903A1|2017-03-23|2018-03-23|Advanced signalling of regions of interest in omnidirectional visual media|

[返回顶部]